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ABSTRACT 

1»e  review>|inajor  developments  in  the  design  of  experiments,  offers eur 
thoughts  on  important  directions  for  the  future,  and  make '^specif ic 
recommendations  for  experimenters  and  statisticians  who  are  students  and 
teachers  of  experimental  design,  practitioners  of  experimental  design,  and 
researchers  jointly  exploring  new  frontiers.  Specific  topics  covered  are 
optimal  design,  computer-aided  design,  robust  design,  response  surface  design, 
mixture  design,  factorial  design,  block  design,  and  designs  for  nonlinear 
models. 
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SIGNIFICANCE  AND  EXPLANATION 


Statistics  is  concerned  not  just  with  the  analysis  of  data,  but  also  with 
how  data  are  collected.  Such  concern  is  quite  natural,  for  analysis  cannot  be 
truly  successful  unless  the  data  are  informative,  and  the  best  way  to  assure 
informative  data  is  to  apply  statistical  principles  to  the  way  in  which  they 
are  collected.  One  area  in  which  statistics  has  made  notable  contributions  to 
collecting  informative  data  is  experimental  design.  This  paper  provides  a 
general,  non-technical  review  of  statistical  work  in  experimental  design, 
discusses  important  directions  for  future  research,  and  makes  some  specific 
recommendations . 

The  focus  of  the  paper  is  on  those  areas  of  experimental  design  that  are 
most  useful  in  the  physical,  chemical,  and  engineering  sciences.  Specific 
topics  covered  are: 

1.  Optimal  design,  in  which  specific  criteria  are  developed  and  studied  in  an 
attempt  to  derive  designs  that  will  provide  the  experimenter  with  the  most 
precise  inferences  possible. 

2.  Computer-aided  design,  in  which  computer  algorithms  that  find  optimal 
designs,  or  designs  with  other  desired  properties,  are  developed. 

3.  Robust  design,  which  studies  the  sensitivity  of  experimental  designs  to 
departures  from  assumptions  on  which  those  designs  were  based. 

4.  Response  surface  designs,  which  exploit  a  simple,  sequential  experimental 
strategy  to  explore  the  relationship  between  a  response  variable  and  several 
continuous  inputs. 

5.  Mixture  designs,  which  provide  strategies  for  modeling  a  response  that 
depends  on  the  relative  amounts  of  several  continuous  inputs. 

6.  Factorial  design,  which  emphasizes  the  usefulness  of  varying  several 
factors  simultaneously  in  an  experiment,  rather  than  just  one  factor  at  a 
time,  and  proposes  economical  strategies  to  do  so. 

7.  Block  designs,  which  offer  efficient  schemes  for  comparing  several 
different  treatments. 

8.  Designs  for  nonlinear  models,  which  suggest  useful  ways  to  design 
experiments  when  the  response  is  assumed  to  be  a  nonlinear  function  of  unknown 
parameters . 

The  paper  concludes  by  addressing  several  recommendations  to 
experimenters  and  statisticians  to  encourage  the  increased  use  of 
statistically  designed  experiments  and  to  facilitate  the  interchange  of  ideas 
between  statisticians  and  experimenters  that  is  an  essential  stimulus  to 
future  research  in  this  area. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MFC,  and  not  with  the  authors  of  this  report. 
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EXPERIMENTAL  DESIGN:  REVIEW  AND  QOMMENT 
David  M.  Steinberg  and  William  G.  Hunter 

1.  INTRODUCTION 

Fisher's  pioneering  work  at^othamsted  Experimental  Station  in  the  1920's 
and  1930's  firmly  established  the  role  of  statistics  in  experimental  design 
and,  vice  versa,  the  role  of  experimental  design  in  statistics.  His 
monumental  work  was  guided  by  the  key  insight  that  statistical  analysis  of 
data  could  be  informative  only  if  the  data  themselves  were  informative,  and 
that  informative  data  could  best  be  assured  by  applying  statistical  ideas  to 
the  way  in  which  the  data  were  collected  in  the  first  place.  In  the  process, 
Fisher  radically  altered  the  role  of  the  statistician:  from  one  of 
after-the-fact  technician  to  one  of  active  collaborator  at  all  stages  of  an 
investigation. 

Fisher  was  employed  to  analyze  data  from  studies  conducted  at  Rothamsted, 
but  he  soon  realized  that  some  important  questions  could  not  be  answered 
because  of  inherent  weaknesses  in  the  planning  of  many  of  the  experiments.  In 
fact,  in  one  particularly  unfortunate  instance,  he  said  that  the  only  analysis 
he  could  perform  was  a  post-mortem  to  find  out  why  the  study  had  died.  Box 
(1980)  described  Fisher's  work  on  the  design  of  experiments,  and  how  much  of 
it  was  inspired  by  problems  of  field  experimentation:  he  developed  his 
insights  concerning  randomization,  blocking,  and  replication;  he  invented  new 
classes  of  experimental  designs;  he  worked  together  with  scientists  who 
applied  his  ideas  in  their  experiments;  by  mail,  he  advised  experimenters  in 
other  places;  and  he  wrote  about  his  ideas  to  help  investigators  realize 
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richer  harvests  of  information  from  their  investments  in  experimental  work. 

The  collaborative  way  in  which  Fisher  worked  with  the  scientists  at 
Rothamsted,  aiding  them  in  their  experimental  research  and  then  using  his 
experiences  as  the  motivation  for  important  statistical  research,  still  serves 
as  a  model  for  statisticians  to  emulate.  For  a  more  detailed  account  of 
Fisher's  work  at  that  time,  see  Yates  and  Mather  (1963),  Yates  (1964),  Box 
(1978),  and  the  tributes  to  Fisher  in  Biometrics  (1962,  Volume  18,  437-454). 

Since  Fisher  first  introduced  statistical  principles  of  experimental 
design,  much  useful  statistical  research  has  been  done.  Our  primary  purpose 
in  this  article  is  to  provide  a  summary  of  selected  work  in  experimental 
design,  rather  than  an  exhaustive  review  of  the  literature,  and  to  offer  some 
thoughts  about  future  directions.  Wherever  possible,  we  will  refer  the  reader 
to  books  that  discuss  the  basic  ideas  of  experimental  design  and  present  many 
of  the  most  widely  used  plans  and  to  other  good  review  articles  summarizing 
research  on  experimental  design  and  highlight  only  the  most  recently  published 
results.  (See  Hahn  1982b  for  a  useful  review  of  available  books  on 
experimental  design.)  We  will  focus  especially  on  the  design  of  experiments 
in  the  physical,  chemical,  and  engineering  sciences. 

During  the  last  quarter  century,  many  papers  on  the  design  of  experiments 
have  appeared  in  Technometrics .  Given  the  jubilee  nature  of  this  article,  we 
felt  it  would  be  appropriate  to  begin  by  reading  those  papers,  a  task  that 
proved  both  enjoyable  and  rewarding.  Our  summary  of  this  research,  which  is 
presented  in  Section  2,  provides  a  perspective  from  which  to  evaluate  current 
research.  In  Sections  3-10  we  review  research  in  several  major  areas: 
optimal  design,  computer-aided  design,  design  robustness  and  design 
sensitivity,  response  surface  designs,  mixture  designs,  factorial  designs, 
block  designs,  and  designs  for  nonlinear  models.  In  Section  11  we  discuss 


some  topics  that  we  believe  deserve  further  study,  and  we  offer  some  personal 
reflections  on  future  directions.  We  conclude  in  Section  12  with  some 
recommendations  for  experimen  .ers  and  statisticians. 

2.  EXPERIMENTAL  DESIGN  IN  TECHNOMETRICS 

Experimental  design  has  always  been  a  prominent  topic  in  Technometrics . 
Figure  1  shows  the  percentage  of  pages  in  Technometrics  devoted  to  this 
subject  on  a  yearly  basis  from  1959  through  1982.  (We  have  included  here  all 
papers  and  notes  whose  principal  focus  is  on  the  theory  or  technique  of 
designing  experiments,  excluding  papers  that  deal  only  with  the  analysis  of 
particular  types  of  designs.  We  have  not  counted  book  reviews,  letters  to  the 
editor,  corrigenda,  and  other  editorial  material.)  The  first  years  of 
Technometrics  witnessed  a  profusion  of  articles  on  experimental  design,  which 
occupied  20%  to  30%  of  the  space  in  the  journal.  While  such  a  high  concentra¬ 
tion  has  not  been  maintained  in  subsequent  years,  the  percentage  haB  still 
consistently  exceeded  10%.  By  contrast,  the  Journal  of  the  American 
Statistical  Association  devoted  less  than  1%  of  its  pages  in  1982  to  articles 
on  experimental  design;  the  corresponding  figure  for  the  Annals  of  Statistics 
in  1982  was  4%. 

Thus  experimental  design  has  clearly  been  a  topic  of  major  interest  for 
Technometrics .  An  informative  picture  of  the  development  of  research  on 
experimental  design  and  its  applications  emerges  from  an  examination  of  these 
articles.  The  first  issues  of  Technometrics  included  many  articles  on 
factorial  and  fractional  factorial  designs  and  block  designs,  topics  that  were 
originally  explored  in  the  context  of  agricultural  experimentation.  Their 
appearance  in  Technometrics  marked  a  realization  that  these  concepts  are  also 
important  in  the  physical,  chemical,  and  engineering  sciences.  Moreover,  the 
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Figure  1:  Experimental  Design  in  TECHNOMETRICS 


1960  1965  1970  1975  1980 


confrontation  of  existing  ideas  in  experimental  design  with  new  areas  of 
application  sparked  creativity.  Innovative  modifications  and  extensions  of 
classical  experimental  designs  were  developed  and  many  useful  articles  were 
published  in  a  short  time.  Following  this  initial  period  of  enthusiasm, 
articles  on  these  particular  topics  have  continued  to  appear,  but  only 
sporadically. 

One  topic  that  has  received  continuing  attention  in  Technometrics  is  the 
design  of  response  surface  experiments.  Response  surface  methodology  was 
stimulated  by  problems  arising  in  chemistry  and  chemical  engineering,  in 
particular  how  to  improve  the  performance  of  systems  by  modifying  the  settings 
of  process  variables  (Box  and  Wilson  1951).  The  strategy  advocated  was  to  use 
a  sequence  of  simple  experimental  designs  to  locate  and  then  explore  regions 
that  promised  high  levels  of  performance.  The  basic  building  blocks  were 
directly  borrowed  from  or  were  extensions  of  the  classical  factorial  designs 
initially  used  in  agriculture  and  biology.  The  new  conceptual  framework 
offered  by  response  surface  methodology,  especially  its  appeal  to  geometric 
ideas,  stimulated  much  new  research.  Technometrics  was  a  natural  forum  for 
the  discussion  of  these  new  ideas.  Response  surface  methodology  provides  an 
example  of  the  type  of  stimulation  that  was  provided  by  the  appearance  of  this 
new  journal  in  the  statistics  literature.  A  steady  flow  of  articles  on 
response  surface  design  appeared  throughout  the  1960's  and  into  the  1970's, 
but  has  abated  in  recent  years. 

Three  new  subjects  assumed  prominence  in  the  1970's:  optimal  design, 
computer-aided  design,  and  mixture  design.  Two  related  topics  that  have  come 
to  the  fore  in  the  last  10  years  are  design  robustness  and  design  sensi¬ 
tivity.  Research  wock  in  robustness  and  sensitivity  is  important  because 
experiments  sometimes  must  be  planned  in  the  face  of  a  considerable  degree  of 
model  uncertainty. 
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The  emergence  of  optimal  design  as  a  central  concern  can  be  seen  quite 
clearly  in  Technometrics :  through  1970  only  two  articles  dealt  explicitly  and 
primarily  with  optimal  design,  but  since  1970  that  number  has  increased  to 
more  than  a  dozen.  Interest  in  computer-aided  design  has  grown  for  two 
reasons:  advances  in  computer  technology  and  the  increasing  influence  of 
optimal  design  theory.  Much  of  the  work  in  computer-aided  design  has  gone 
into  the  development  of  powerful  algorithms  for  finding  optimal  designs  and 
other  designs  with  certain  desired  properties  (e.g.,  orthogonality  of  some 
factor  effects,  or  particular  confounding  patterns).  Mixture  designs, 
although  first  discussed  in  the  1950's,  received  little  attention  prior  to 
1970 >  however,  they  have  stimulated  great  interest  over  the  last  10  years, 
including  more  than  15  articles  in  Technometrics . 


3.  OPTIMAL  DESIGN 

The  traditional  motivation  underlying  the  theo-  y  of  optimal  design  is 
that  experiments  should  be  designed  to  achieve  the  most  precise  statistical 
inference  possible.  Kiefer  (1981)  stated  that  research  work  on  optimal  design 
arose  in  part  as  a  reaction  to  earlier  research  on  design,  which  emphasized 
attractive  combinatoric  properties  rather  than  inferential  properties.  Design 
optimality  was  first  considered  by  Smith  (1918),  and  early  work  in  the  subject 
was  done  by  Wald  (1943),  Hotelling  (1944),  and  Elfving  (1952).  The  major 
contributions  to  the  area,  however,  were  made  by  Kiefer  (1958,  1959)  and 
Kiefer  and  Wolfowitz  (1959,  1960),  who  synthesized  and  greatly  extended  the 
previous  work.  Although  the  ideas  of  optimal  design  initially  generated 
considerable  controversy  (see,  for  example,  the  discussion  accompanying  the 
paper  by  Kiefer  1959),  they  have  since  become  well-established  in  the 
statistical  literature.  In  some  areas,  such  as  the  design  of  block 
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experiments,  the  use  of  optimal  design  theory  is  now  accepted  as  a  fundamental 
tool  for  comparing  designs  (see  Section  9).  In  other  areas,  however,  there  is 
still  disagreement  over  the  applicability  of  optimal  design  theory  (see,  for 
example,  the  discussion  in  Section  6  on  response  surface  designs). 

Excellent  reviews  of  research  work  on  optimal  design  have  appeared.  For 
readers  interested  in  the  most  recent  developments  in  optimal  design,  we 
recommend  the  reviews  by  Atkinson  (1982),  Pazman  (1980),  and  Ash  and  Hedayat 
(1978).  The  review  by  St.  John  and  Draper  (1975)  provides  a  good  introduction 
to  the  topic.  The  recent  book  by  Silvey  (1980)  presents  a  concise  summary  of 
the  classical  results  in  optimal  design  theory,  and  the  book  by  Fedorov  (1972) 
is  a  valuable  compendium  of  results. 

The  influence  of  optimal  design  has  extended  to  almost  all  areas  of 
esqperimental  design,  and  it  will  be  useful  to  review  some  of  the  most  basic 
definitions  and  results  because  they  will  be  needed  in  subsequent  sections. 

To  apply  optimal  design  theory  in  practice  requires  a  criterion  for  comparing 
experiments  and  an  algorithm  for  optimizing  the  criterion  over  the  set  cf 
possible  experimental  designs.  We  will  define  the  most  commonly  used  criteria 
here  but  will  defer  the  consideration  of  algorithms  to  Section  4.  The 
classical  criteria  are  derived  within  the  context  of  linear  model  theory  in 
which  it  is  assumed  that  the  experimental  data  can  be  represented  by  the 
equation 

Y1  -  f(*i)’jg  +  ei,  (3.1) 

where  Y^  is  the  measured  response  from  the  _ith  experimental  run,  x^  is  a 
vector  of  predictor  variables  for  the  _ith  run,  f  is  a  vector  of  p 
functions  that  model  how  the  response  depends  on  x^,  J  is  a  vector  of  p 
unknown  parameters,  and  is  the  experimental  error  for  the  i^th  run. 


-7- 


A  natural  way  to  measure  the  quality  of  statistical  inference  with 

respect  to  a  single  parameter  is  in  terms  of  the  variance  of  the  parameter 

2 

estimate.  If  the  errors  are  uncorrelated  and  have  constant  variance  0  ,  the 
variance-covariance  matrix  of  the  least  squares  estimator  £  is 

var||}  «  o2(x'x)_1,  (3.2) 

where  X  is  the  nxp  matrix  whose  _ith  row  is  We  will  limit  our 

discussion  here  to  the  case  where  X  has  full  column  rank.  (Mathematically, 
the  theory  is  not  substantially  different  if  X  is  not  of  full  rank.) 

Another  useful  way  to  measure  the  quality  of  inference  is  in  terms  of  the 
variance  of  the  estimated  response  at  jj,  which,  from  (3.1),  i9  given  by 

d(x)  -  02f(x) ,(x,x}"1f(x).  (3.3 

Both  (3.2)  and  (3.3)  depend  on  the  experimental  design  only  through  the  pxj 
matrix  (x'x)  \  and  suggest  that  a  good  experimental  design  will  be  one  t 
makes  this  matrix  small  in  some  sense.  Since  there  is  no  unique  size  orderi.  ^ 
of  the  pxp  matrices,  various  real-valued  functionals  have  been  suggested  as 
measures  of  "smallness."  The  most  popular  of  these  optimality  criteria  are 
listed  below: 

1 .  D-Optimality  -  A  design  is  said  to  be  D-optimal  if  it  minimizes 
det  (x'x)  \  where  det  denotes  determinant. 

2.  A-Optimality  -  A  design  is  said  to  be  A-optimal  if  it  minimizes 
tr  (x'x)  \  where  tr  denotes  trace. 

3.  E-Optimality  -  A  design  is  said  to  be  E-optimal  if  it  minimizes  the 
maximal  eigenvalue  of  (x'x) 

4.  G-Optimality  -  A  design  is  said  to  be  G-optimal  if  it  minimizes 
max  d(jj),  where  the  maximum  is  taken  over  all  possible  vectors  x  of 
predictor  variables. 
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5.  I^-Optimality  -  A  design  is  said  to  be  1^  -optimal  if  it  minimizes 
/  d(x)A(djc),  where  \  is  a  probability  measure  on  Lhe  space  of  predictor 
variables.  This  criterion,  which  is  sometimes  called  average  integrated 
variance,  also  belongs  to  a  more  general  class  of  L-optimality  criteria 
discussed  by  Fedorov  (1972). 

One  important  result  in  optimal  design  theory  is  the  general  equivalence 
theorem  (Kiefer  and  Wolfowitz  1960),  which  links  D-  and  G-optimality.  The 
theorem  is  phrased  in  terms  of  design  measure,  in  which  a  design  is 
represented  by  a  probability  measure  on  the  predictor  variable  space.  Thus, 
for  example,  a  trial  of  n  runs  (an  "exact"  design)  would  be  represented  as  a 
discrete  measure  with  mass  1/n  at  each  of  the  n  points  of  the  design.  The 
concept  of  design  measure  is  useful  in  studying  optimal  design  theory  from  a 
mathematical  point  of  view  because  it  replaces  a  discrete  optimization  problem 
(finding  the  optimal  "exact"  design)  with  a  continuous  problem  (finding  the 
optimal  design  measure)  which  is  often  easier  to  solve.  Although  the  solution 
to  the  continuous  problem  might,  in  theory,  be  a  measure  with  infinitely  many 
support  points,  Kiefer  and  Wolfowitz  (1960)  showed  that  solutions  could  always 
be  limited  to  measures  with  finitely  many  support  points »  the  value  of  the 
measure  at  each  point  would  then  give  the  optimal  proportion  of  runs  that 
should  be  made  there.  The  General  Equivalence  Theorem  states  that  among 
design  measures  £,  the  following  three  conditions  are  equivalent: 

1.  £*  is  D-optimum. 

2.  £*  is  G-optimum. 

3.  max  d ( £  * )  -  p. 

The  third  condition,  which  provides  a  simple  way  to  check  whether  a  design  is 


D-  and  G-optimal,  is  useful  in  constructing  such  designs. 


4.  COMPUTER-AIDED  DESIGN  OF  EXPERIMENTS 


Research  on  the  use  of  computers  in  the  design  of  experiments  has  been 
closely  related  to  the  increasing  attention  given  to  optimal  design  in  the 
literature.  As  described  in  the  previous  section,  the  basic  idea  of  optimal 
design  is  usually  to  choose  a  design  that  optimizes  some  inference  criterion 
over  the  set  of  designs  being  considered.  In  practice,  this  optimization 
problem  may  be  difficult  or  impossible  to  solve  analytically.  The  first 
research  done  on  using  the  computer  as  an  essential  aid  in  tackling  this 
problem  in  experimental  design  was  apparently  that  by  Box  and  Hunter 
( 1965a, b)  on  design  for  nonlinear  models.  The  remainder  of  the  present 
section  will  emphasize  the  use  of  computers  in  the  design  of  linear  regression 
and  factorial  experiments.  We  will  discuss  the  two  topics  in  turn. 

4. 1  Regression  Experiments 

Much  research  has  been  concerned  with  the  development  of  constructive 
algorithms  that  can  be  used  to  find  optimal  or  near-optimal  designs.  Initial 
work  on  the  development  of  such  algorithms  focused  on  finding  D-  and  G-optimal 
design  measures  (Wynn  1972,  Fedorov  1972,  Atwood  1973).  'These  algorithms 
involve  the  construction  of  a  sequence  of  design  measures  in  which  each 
succeeding  measure  is  a  convex  combination  of  the  current  measure  and  a  point 
mass  whose  location  is  chosen  with  the  aid  of  the  third  condition  of  the 
General  Equivalence  Theorem  stated  in  Section  3.  General  conditions  for  such 
algorithms  to  converge  to  an  optimal  design  measure  were  given  by  Wu  and  Wynn 
(1978). 

Computer-aided  design  of  regression  experiments  was  stimulated  by  the 
desire  to  achieve  exact  n-run  optimal  designs.  In  some  cases  good  designs  can 
be  found  directly  from  an  optimal  design  measure  by  spreading  out  the  runs  to 
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approximate  the  optimal  allocation.  For  designs  with  a  small  number  of  runs 


or  models  with  many  parameters,  however,  this  strategy  may  be  difficult  to 
implement  or  may  lead  to  designs  that  are  quite  inefficient. 

Improvements  in  computer  technology  have  made  it  possible  to  adopt  an 
alternative  scheme:  developing  computer  program*  to  directly  find  exact  n-run 
optimal  designs.  The  most  popular  computer  algorithm  developed  to  date,  known 
as  DETMAX,  was  originated  by  Mitchell  ( 1974a, b)  to  find  D-optimum  designs. 

This  program  requires  the  user  to  specify  the  model  and  the  number  of 
experimental  runs  (n),  and  to  list  all  the  possible  design  points  for  the 
experiment.  An  initial  n-run  starting  design  may  be  supplied  by  the  user  or 
generated  by  the  program.  The  program  then  seeks  to  maximize  det  ( X'*) , 
which  is  equivalent  to  minimizing  det  (x'j)  \  by  adding  and  deleting  design 
points  until  a  convergence  criterion  is  satisfied.  The  choice  of  which  point 
to  add  or  delete  at  each  step  is  made  so  that  det  (x'z)  is  maximized  among 
all  possibilities.  Galil  and  Kiefer  (1980b)  showed  that  this  criterion  will 
always  add  a  design  point  at  which  the  variance  of  the  estimated  response 
(equation  3.3)  is  greatest j  this  property  is  related  to  the  result  of  the 
general  equivalence  theorem  that  the  variance  of  the  estimated  response  for  an 
optimal  design  measure  obtains  its  maximum  value  at  each  of  the  design 
points.  The  DETMAX  program  allows  the  possibility  of  "excursions,"  in  which 
several  points  are  added  and  then  several  points  deleted,  in  the  hope  of 
avoiding  local  maxima.  Mitchell  (1974a)  also  recommended  that  a  number  of 
different  starting  designs  be  used  because  no  one  starting  design  is 
guaranteed  to  lead  to  an  optimum  design.  Several  characteristics  are 
calculated  for  designs  at  or  near  the  D-optimal  design  and  these  properties 
may  be  used  as  additional  bases  of  comparison. 
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Mitchell  (1974b)  used  the  DETMAX  program  to  tabulate  designs  for 
first-order  regression  models  with  up  to  nine  factors  and  a  variety  of  sample 
sizes.  Mitchell  and  Bayne  (1978)  used  the  program  to  find  fractions  of 
three-level  factorial  designs  for  models  including  some  two-factor 
interactions  and  for  models  including  two-factor  interactions  and  pure 
quadratic  terms. 

Galil  and  Kiefer  (1980b)  developed  useful  modifications  to  DETMAX,  which 
led  to  a  substantial  reduction  in  the  amount  of  time  needed  to  search  for  an 
optimal  design  and  in  the  amount  of  computer  space  required  by  the  program. 
They  also  proposed  a  systematic  method  for  generating  an  initial  design.  The 
reduction  in  time  is  quite  important  because  it  allows  many  more  starting 
designs  to  be  used  for  a  fixed  computer  budget,  thereby  increasing  the  chance 
of  finding  an  optimal  exact  design.  The  space-saving  methods  make  it  possible 
to  study  larger  problems.  Galil  and  Kiefer  also  studied  in  detail  the  problem 
of  quadratic  regression  for  designs  that  are  fractions  of  three-level 
factorials  and  tabled  the  best  designs  found  by  the  modified  DETMAX  algorithm. 

A  new  program  developed  by  Welch  (1982)  takes  advantage  of  the 
branch-and-bound  optimization  strategy.  This  program  is  more  powerful  than 
DETMAX  in  that  it  is  assured  to  find  all  possible  n-run  optimal  designs  for  a 
given  model  and  a  specified  set  of  possible  design  points.  It  is  not  clear, 
however,  what  additional  cost  in  Central  Processing  Unit  (CPU)  time  may  be 
involved.  Welch  also  considered  in  detail  the  problem  of  quadratic  regression 
with  three-level  factors  and  proved  that  some  of  the  designs  tabled  by  Galil 
and  Kiefer  (1980b)  were,  in  fact,  D-optimal. 

Snee  and  Marquardt  (1974)  considered  the  special  problem  of  optimal 
design  for  mixture  experiments  (see  Section  7).  Their  XVERT  program  was 
designed  to  find  extreme  vertices  of  the  design  region  and  to  calculate 


several  optimality  criteria  for  a  variety  of  extreme  vertex  designs.  Snee 
(1979)  described  the  CONSIM  algorithm  for  finding  extreme  vertices  and 
centroids  of  mixture  design  regions  and  recommended  that  it  be  combined  with 
XVERT  (when  there  are  at  most  four  mixture  components)  or  with  DETMAX  (when 
there  are  five  or  more  components)  to  generate  experimental  designs.  Nigam, 
Gupta,  and  Gupta  (1983)  proposed  a  modified  version  of  the  XVERT  algorithm  for 
finding  extreme  vertices  of  mixture  design  regions;  the  modified  algorithm 
involves  less  computational  effort  than  does  XVERT  but  the  authors  found  that 
there  was  little  loss  of  efficiency. 

One  of  the  common  features  of  the  above  papers  is  their  use  of  a  design 
region  with  a  finite  number  of  possible  design  points,  rather  than  a 
continuous  region  with  infinitely  many  points.  The  primary  reason  for 
limiting  the  algorithms  to  finite  design  spaces  is  to  simplify  the  task  of 
selecting  what  point  to  add  to  an  existing  design.  The  benefit  of  each 
candidate  point  can  be  computed  and  the  best  point  is  then  selected.  The  use 
of  a  finite  design  region  is  reasonable  on  practical  grounds  even  when  some  of 
the  factors  are  continuous,  quantitative  variables  because  an  experimenter's 
ability  to  exactly  fix  the  levels  of  quantitative  factors  in  an  experiment  is 
limited.  If,  however,  there  are  many  possible  settings  for  each  quantitative 
factor,  so  that  the  number  of  design  points,  although  finite,  is  quite  large, 
a  more  efficient  strategy  may  be  to  treat  the  design  region  as  though  it  were 
continuous.  To  choose  the  new  design  point  from  a  continuous  region,  a 
functional  optimization  algorithm  must  be  implemented  and  the  success  of  the 
design  algorithm  will  depend,  at  least  in  part,  on  the  ability  of  the 
optimization  algorithm  to  find  the  best  point  to  add  at  each  iteration. 
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Cook  and  Nachtsheim  (1980)  used  Powell'a  (1964)  conjugate  direction 
method  to  maximize  det  (x'x)  at  each  iteration  and  compared  several  computer 
design  algorithms.  Not  surprisingly,  they  found  that  the  best  results  were 
obtained  by  those  algorithms  that  required  the  most  CPU  time.  They  concluded 
that  DETMAX  (without  Galil  and  Kiefer's  modifications)  gave  good  results 
relative  to  the  amount  of  CPU  time  it  required. 

Evans  (1979)  presented  a  simple  computer  algorithm  for  augmenting  an 
existing  experimental  design  by  a  fixed  number  of  runs  so  that  the  combined 
design  would  be  D-optimal.  His  algorithm  called  for  simultaneously  choosing 
all  the  new  design  points,  rather  than  the  sequential  selection  characteristic 
of  other  methods,  and  used  a  modified  version  of  Nelder  and  Mead's  (1965) 
simplex  method  to  maximize  det  (x'x) .  Both  DETMAX  and  Welch's  algorithm  also 
allow  the  user  to  augment  an  existing  design,  although  within  the  framework  of 
a  finite  design  space. 

Johnson  and  Nachtsheim  (1983)  studied  several  problems  in  the 
construction  of  designs  on  continuous,  convex  design  spaces.  They  concluded 
that  Evans's  approach  of  simultaneously  searching  for  all  the  new  points  to  be 
added  to  an  existing  design  offered  little  improvement  over  sequential  search 
procedures.  They  compared  several  optimization  algorithms  for  choosing  the 
point  that  maximizes  det  (x'x)  and  found  that  Powell's  (1964)  algorithm  gave 
the  best  results.  Finally,  they  found  that  Galil  and  Kiefer's  (1980b)  method 
of  generating  an  initial  design  was  quite  successful. 

One  of  the  -first  articles  on  computer-aided  design  of  regression 
experiments  took  an  approach  quite  different  from  those  described  previously. 
Kennard  and  Stone  (1969)  argued  that  a  good  design  should  cover  the  design 
space  as  uniformly  as  possible.  They  developed  the  CADEX  algorithm  to  achieve 
this  goal  by  sequentially  choosing  that  point  furthest  from  the  current  design 
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points.  They  favored  the  use  of  this  uniform  coverage  criterion  because  it 
does  not  require  the  assumption  of  any  particular  model  such  as  (3.1)  for  the 
response  and  because,  when  several  response  variables  are  measured,  the  same 
design  will  be  appropriate  for  each  one. 

4.2  Factorial  Experiments 

Computer  algorithms  have  also  been  developed  to  aid  in  the  design  of 
factorial  experiments.  Patterson  (1976)  described  the  DSIGN  program,  which 
produces  designs  for  factors  at  any  number  of  levels  with  a  variety  of 
blocking  structures,  including  Latin  squares  and  split  plots,  according  to  a 
generating  design  key  supplied  by  the  user.  The  design  key  specifies  the  plot 
aliases  of  the  main  effects  of  treatment  factors.  Bailey,  Gilchrist,  and 
Patterson  (1977)  and  Patterson  and  Bailey  (1978)  described  the  use  of  design 
keys  in  identifying  confounding  patterns  and  in  constructing  designs.  The 
designs  produced  by  the  DSIGN  program  are  compared  on  the  basis  of  their 
confounding  patterns,  rather  than  any  of  the  formal  optimality  criteria 
mentioned  earlier. 

Jones  and  Eccleston  (1980)  described  a  computer  algorithm  for  the 
generation  of  optimal  block  designs.  As  an  optimality  criterion,  they 
proposed  minimizing  the  weighted  sum  of  the  variances  of  a  set  of  treatment 
contrasts,  which  is  similar  to  the  criterion  for  A-optimality.  This  criterion 
depends  on  two  characteristics  of  the  design:  the  replication  numbers,  which 
state  how  many  times  each  treatment  will  be  used,  and  the  set  of  concurrences, 
which  gives  the  number  of  times  each  pair  of  treatments  occurs  in  the  same 
block.  The  algorithm  determines  the  replication  numbers  and  concurrences  in 
two  separate  stages,  known  as  exchange  and  interchange.  Beginning  with  an 
initial  design  that  specifies  the  treatments  assigned  to  each  block,  the 
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exchange  procedure  locates  runs  that  contribute  little  to  the  optimality 
criterion  and  seeks  to  find  different  treatments  for  those  runs  that  will  make 
a  greater  contribution.  Thus  the  exchange  procedure  alters  the  initial 
replication  numbers  and  the  set  of  concurrences.  The  interchange  procedure 
then  seeks  to  improve  the  optimality  criterion  by  switching  the  block 
assignments  of  pairs  of  treatments  (e.g.,  the  blocks  ABC,  DEF  might  be 
changed  to  ABF,  DEC  by  interchanging  treatments  C  and  F).  lhus  the 
interchange  procedure  does  not  affect  the  replication  numbers  but  does  change 
the  set  of  concurrences.  Eccleston  and  Jones  (1980)  extended  the 
exchange-interchange  algorithm  to  designs  for  the  elimination  of  both  row  and 
column  effects. 

Wu  (1981a)  presented  a  computer  algorithm  for  assigning  experimental 
units  to  different  treatments  when  categorical  covariate  information  is 
available  for  each  unit.  The  algorithm  is  designed  to  balance  the  covariates 
across  the  different  treatments,  yet  is  surprisingly  simple  and  requires  no 
matrix  inversion. 


S.  DESIGN  ROBUSTNESS 

Box  (1953)  introduced  the  word  "robust"  in  the  statistical  literature  to 
describe  procedures  that  give  good  results  even  though  there  might  be 
violations  in  the  assumptions  upon  which  these  procedures  are  based. 

Following  up  a  line  of  research  initiated  by  Pearson  (1931),  Box  (1953) 
examined  the  effect  on  the  analysis  of  variance  and  on  Bartlett's  test  of 
departures  from  normality,  an  assumption  underlying  both  procedures.  Pearson 
(1931)  had  discovered  that  the  analysis  of  variance  is  robust  to  such 
violations  of  assumption,  but  suggested  that  his  conclusion  would  not  be  valid 
for  comparing  estimates  of  variance  based  on  independent  samples.  Box  (1953) 
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found  that,  indeed,  Bartlett's  test  is  quite  sensitive  to  departures  from 
normality.  This  result  led  him  to  observe  that  the  use  of  Bartlett's  test  as 
a  preliminary  to  the  analysis  of  variance  —  a  practice  recommended  by  some 
statisticians  at  the  time  —  was  "rather  like  putting  to  sea  in  a  rowing  boat 
to  find  out  whether  conditions  are  sufficiently  calm  for  an  ocean  liner  to 
leave  port I"  (p.  333). 

The  examination  of  standard  statistical  techniques  to  determine  their 
sensitivity  to  assumptions  and  the  development  of  new  techniques  that  are  less 
sensitive  have  been  focal  points  of  statistical  research  in  the  last  two 
decades  (see  Huber  1981).  Experimental  design  is  an  area  in  which  it  is 
particularly  compelling  to  investigate  questions  of  robustness  because  a 
researcher '8  assumptions  about  the  experimental  process  are  often  crucial  in 
determining  the  design.  Moreover,  the  design  must  be  chosen  before  the  data 
are  collected  and  so  cannot  be  discarded  if  the  data  indicate  that  the 
assumptions  are  seriously  incorrect.  (By  contrast,  techniques  for  data 
analysis  may  be  replaced  by  other  alternatives  if  their  use  is  contraindicated 
by  the  observed  data.)  Thus  it  is  important  to  examine  experimental  designs 
for  sensitivity  to  assumptions.  Interest  in  design  robustness,  therefore, 
should  come  as  no  surprise;  if  anything,  we  are  surprised  that  this  topic  has 
not  attracted  greater  attention. 

The  assumption  that  underlies  most  research  work  in  experimental  design 
is  that  the  experiment  can  be  adequately  described  by  an  equation  of  the  form: 

response  *  model  +  error,  (5.1) 

where  the  model  states  the  effect  of  the  predictor  variables  on  the  response 
variable  and  the  error  describes  the  general  form  of  departures  from  the 
model.  Experimenters  frequently  have  tentative  models  in  mind,  either  on  the 
basis  of  theoretical  considerations  or  on  the  belief  that  a  simple  empirical 
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model  will  be  adequate,  at  least  over  the  current  range  of  experimentation. 

It  is  unlikely,  however,  that  the  experimenter  will  be  absolutely  certain  that 
any  tentatively  entertained  model  will  be  adequate,  and  design  strategies  that 
fail  to  take  this  uncertainty  into  account  must  be  viewed  with  some 
skepticism.  In  particular,  designs  derived  using  the  optimality  criteria 
discussed  in  Section  3  are  known  to  depend  quite  critically  on  the  particular 
model  that  is  assumed.  These  designs  tend  to  concentrate  all  the  experimental 
runs  on  a  small  number  of  design  points  and  are  ideally  suited  to  estimating 
the  coefficients  of  the  assumed  model,  but  they  provide  little  or  no  ability 
to  check  for  lack  of  fit.  Assumptions  about  the  error  component  in  (5.1)  are 
typically  characterised  in  terms  of  a  probability  distribution  and  are  also 
subject  to  uncertainty.  The  research  work  reviewed  in  this  section  concerns 
the  consequences  for  experimental  design  of  misspecifying  the  form  of  the 
model  or  the  error. 

Two  different,  but  complementary,  approaches  have  been  proposed  for 
planning  experiments  in  the  face  of  model  uncertainty.  The  first  approach  has 
sought  designs  that  will  yield  reasonable  results  for  the  proposed  model  even 
though  it  is  known  to  be  inexact.  We  call  these  designs  "model-robust 
designs"  and  discuss  them  in  Section  5.1.  The  second  approach  has  focused  on 
developing  designs  that  facilitate  improvement  of  the  proposed  model  by  trying 
to  highlight  suspected  inadequacies.  We  call  these  designs  "model-sensitive 
designs"  and  discuss  them  in  Section  5.2. 

Another  line  of  research  in  design  robustness  concerns  the  implications 
for  experimental  design  of  inaccurate  assumptions  about  the  error  rather  than 
the  model.  We  call  these  "error-robust  designs"  and  discuss  them  in  Section 
5.3. 
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5. 1  Model-Robust  Designs 

Box  and  Draper  (1959)  were  the  first  authors  to  consider  in  depth  the 
effect  of  model  misspecif ication  on  experimental  design.  They  criticized  the 
common  optimality  criteria  defined  in  Section  3  for  implicitly  assuming  that 
the  proposed  model  is  exactly  correct.  They  argued  that  a  more  appropriate 
criterion  for  comparing  experimental  designs  is  the  average  mean  squared  error 
(J)  over  a  region  of  interest  R,  which  is  contained  in  the  total  experimental 
region: 

J  -  (n/O2*))  /  s{[g(x)  -  g(x)]2}dx,  (5.2) 

R 

A 

where  g(g)  is  the  true  response  function,  g(x)  is  the  least  squares 

estimate  of  g(g) »  and  fl  »  / dg.  This  expression  can  be  decomposed  as  the  sum 

R 

of  a  bias  component  and  a  variance  component: 

J  -  (n/o2n) (J[e( g(x) }  -  g(x)]2dx  +  Jvarj g(x) } dx) ,  (5.3) 

R  R 

respectively.  Box  and  Draper  (1959)  considered,  in  particular,  the  effect  of 
assuming  a  first-degree  polynomial  regression  model  when  the  true  model  is  a 
second-degree  polynomial.  They  found  that  the  designs  that  minimized  average 
mean  squared  error  were  similar  to  those  that  minimized  the  bias  component 
alone,  but  were  quite  different  from  those  that  minimized  the  variance 
component.  Thus  their  "minimum  bias"  designs  differed  markedly  from  those 
implied  by  the  traditional  optimality  criteria,  which  consider  only  functions 
of  the  variance.  The  "minimum  bias"  designs  could  be  found  by  choosing  the 
design  points  in  such  a  way  that  specified  moments  of  the  design  matched  those 
of  a  uniform  probability  distribution  on  the  region  of  interest. 

Box  and  Draper  (1963)  and  Huber  (1975)  reached  similar  conclusions.  Box 
and  Draper  (1963)  extended  the  work  discussed  above  by  studying  the  situation 
in  which  the  assumed  model  is  quadratic  but  the  true  model  is  cubic.  Huber 
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(1975)  investigated  the  sensitivity  of  optimal  design  to  model 
misspecification  by  conducting  a  minimax  analysis.  For  a  given  design,  he 
determined  what  true  response  function  would  lead  to  the  greatest  mean  squared 
error.  Huber  found  that  optimal  designs  based  on  first-degree  polynomial 
regression  models  could  be  subject  to  considerable  bias  from  quadratic  terms. 

Kussmaul  (1969)  investigated  the  effect  of  model  misspecification  in 
simple  polynomial  regression.  In  particular,  he  was  concerned  with  the  fact 
that  classical  optimal  designs  tend  to  concentrate  all  the  experimental  runs 
at  a  small  number  of  design  points.  For  example,  the  D-  and  G-optimal  design 
measure  for  estimating  a  polynomial  model  of  degree  j  locates  experimental 
runs  at  exacly  j  +  1  distinct  levels  of  the  predictor  variable.  This  design 
can  provide  no  indication  that  a  higher  degree  polynomial  may  be  needed. 
Kussmaul  suggested  that  this  problem  might  be  overcome  by  using  the  G-optimal 
design  for  a  polynomial  model  of  degree  k,  with  k  slightly  greater  than 
j.  He  concluded  that  the  loss  of  efficiency  using  this  design  strategy,  with 
respect  to  the  G-optimality  criterion,  was  quite  small  and  was  more  than 
offset  by  the  added  protection  of  being  able  to  fit  a  polynomial  of  higher 
degree,  if  necessary. 

Lauter  (1974)  considered  the  general  problem  of  optimal  design  when  the 
form  of  the  true  response  function  is  unknown  but  assumed  to  belong  to  a 
specified  class  of  linear  models.  She  proposed  extending  the  common 
optimality  criteria  for  an  exactly  assumed  model  to  the  broader  class  of 
models  by  using  different  forms  of  averaging  with  respect  to  a  weighting 
measure  on  the  class  of  models.  The  resulting  designs  are  not,  in  general, 
optimal  for  any  of  the  models,  but  she  claims  they  should  be  reasonably 
efficient  for  all  models  considered  likely.  Cook  and  Nachtsheim  (1982) 
applied  Lauter's  general  approach  to  polynomial  regression.  They  assumed  that 
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a  low-degree  polynomial  would  probably  be  adequate  to  approximate  the  true 
response  function,  but  that  a  polynomial  of  higher  degree  might  be 
necessary.  Designs  were  compared  on  the  basis  of  average  inefficiency,  where 
the  inefficiency  for  a  polynomial  of  fixed  degree  was  calculated  by  comparison 
with  the  best  design  for  that  degree.  The  average  was  weighted  to  reflect  the 
assumption  that  a  low-degree  polynomial  would  probably  be  adequate  by  placing 
most  of  the  weight  on  those  models. 

Several  authors  have  considered  modifying  the  basic  model  to  include 
possible  effects  of  model  inadequacy.  O'Hagan  (1978)  postulated  a  Bayesian 
model  in  which  jg  in  (3.1)  is  replaced  by  £(*)•  The  dependence  of  £  on 
m  is  characterized  by  a  prior  probability  distribution  that  reflects  beliefs 
about  the  likely  smoothness  and  stability  of  the  true  response  function.  For 
this  model,  he  found  that  a  design  criterion  based  on  posterior  variance 
favored  placing  more  points  near  the  center  of  the  design  region  when  compared 
with  constructing  designs  on  the  basis  of  criteria  like  D-optimality .  The 
discussion  to  O'Hagan's  paper  provides  a  lively  introduction  to  different 
approaches  to  model  robustness  in  experimental  design.  Smith  and  Verdinelli 
(1980)  adopted  a  hierarchical  Bayesian  model  of  the  form  analyzed  by  Lindley 
and  Smith  (1972).  The  hierarchical  structure  incorporates  a  particular  model 
such  as  a  low-degree  polynomial  but  also  reflects  the  degree  to  which  the 
experimenter  is  confident  that  the  polynomial  model  is  adequate.  They 
examined  the  allocation  of  runs  to  a  fixed  set  of  doses  in  a  dose-response 
experiment  and  found  that  perfect  confidence  in  an  assumed  polynomial  model 
led  to  the  D-optimum  allocation  for  that  model.  As  confidence  in  the  model 
decreased,  however,  the  allocation  changed  smoothly  to  an  even  distribution  of 
runs  among  the  doses.  Pesotchinsky  (1982)  studied  the  implications  for  design 
of  the  "approximately  linear"  regression  model  of  Sacks  and  Ylvisaker  (1978), 
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in  which  the  response  function  is  assumed  to  differ  from  a  first-degree 
polynomial  by,  at  most,  a  fixed  convex  function.  He  found  that  the  designs 
depended  on  the  form  of  the  fixed  function,  its  magnitude  relative  to 
experimental  error,  and  the  sample  size.  Although  Pesotchinsky's  approach  to 
incorporating  model  inadequacy  is  quite  different  from  those  of  O' Hagan  and  of 
Smith  and  Verdinelli,  who  used  Bayesian  models,  the  magnitude  of  potential 
bias  relative  to  experimental  error  proved  to  be  a  key  parameter  in  all  three 
papers. 

Wu  (1981b)  considered  a  different  type  of  robustness  —  the  possibility 
that  a  simple  additive  model  for  a  block  design  would  be  violated  by  the 
addition  of  fixed,  but  unknown,  unit  effects.  Using  a  minimax  criterion  to 
study  the  sensitivity  of  different  design  strategies,  he  concluded  that 
randomized  assignment  of  the  units  to  the  treatments  was  the  best  way  to 
obtain  designs  robust  to  the  contaminating  unit  effects.  Wu’s  results  have 
been  generalized  by  Li  (1983). 

5.2  Model-Sensitive  Designs 

The  research  reviewed  above  was  motivated  by  the  desire  to  make  the 
analysis  of  the  experiment  insensitive,  or  robust,  to  possible  uncertainties 
or  inaccuracies  in  the  specification  of  the  model.  The  experimenter's  primary 
interest,  however,  may  be  to  highlight  the  uncertainties  and  inaccuracies  in 
order  to  modify  or  refine  the  model  initially  entertained;  the  experimenter 
will  then  require  a  design  that  is  sensitive  to  the  differences  between 
alternative  models.  Model-robust  and  model-sensitive  designs  are  quite 
similar  to  one  another  and  share  much  common  ground  because  the  essential  idea 


behind  both  concepts  is  that  tentatively  proposed  models  are  never  exact. 


Some  work  on  model-sensitive  designs  has  been  motivated  by  studies  in 


which  nonlinear  models  are  used  (see  Sect.  10),  although  the  theory  developed 
is  equally  applicable  to  linear  models.  The  experimenter  may  be  able  to  list 
a  set  of  plausible  nonlinear  models,  perhaps  through  knowledge  of  the 
underlying  experimental  mechanism  or  previous  experience  with  similar 
experiments.  Experiments  are  then  designed  whose  primary  purpose  is  to 
discriminate  among  candidate  models.  Hunter  and  Reiner  (1965),  Box  and  Hill 
(1967),  Atkinson  and  Fedorov  (1975a,b),  and  Atkinson  (1981)  discussed  non¬ 
linear  models.  Atkinson  and  Cox  (1974)  discussed  linear  models.  These 
techniques  have  usually  been  referred  to  as  model  discrimination  designs;  they 
are  a  special  case  of  what  we  define  here  as  model-sensitive  designs. 

Hill,  Hunter,  and  Wichern  (1968)  suggested  the  use  of  a  design  criterion 
that  simultaneously  takes  into  account  the  needs  of  model  discrimination  and 
parameter  estimation;  they  illustrated  its  use  with  nonlinear  models. 

Atkinson  (1972)  proposed  a  criterion  for  the  design  of  linear  regression 
experiments  that  have  the  joint  aim  of  estimating  parameters  in  a  tentatively 
assumed  model  and  of  testing  for  inadequacy  of  that  model.  He  considered 
models  of  the  form; 

Yi  "  +^2(*i,’JB2  +  ei'  (5*4) 

where  f^(x^)  corresponds  to  the  tentatively  assumed  model  and  f^lx^) 
corresponds  to  those  additional  terms  thought  most  likely  to  induce  bias.  For 
example,  might  be  a  first-degree  polynomial  and  might  be  e  function 

containing  quadratic  terms.  Selecting  from  standard  classes  such  as 
factorials  and  central  composite  designs,  Atkinson  found  experimental  plans 
that  compromise  between  the  two  goals. 

Jones  and  Mitchell  (1978)  studied  designs  whose  primary  purpose  is  to 
detect  inadequacy  of  a  tentatively  assumed  linear  model  in  the  direction  of  a 
specific  alternative  model.  They  also  considered  models  of  the  form  (5.4), 
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but  used  criteria  different  from  Atkinson's.  One  major  difference  is  that 
Jones  and  Mitchell's  designs  depend  on  the  unknown  parameter  vector  g2  *n 
(5.4),  while  Atkinson's  do  not.  Jones  and  Mitchell  suggested  two  methods  to 
overcome  this  dependence,  both  of  which  are  related  to  a  design  criterion 
proposed  by  Atkinson  and  Fedorov  (1975a,b). 

A  related  approach  is  that  of  Stigler  (1971),  who  proposed  the  idea  of 
restricted  D-  and  G-optimal  designs  for  polynomial  regression,  in  which  the 
optimal  design  for  an  assumed  polynomial  model  would  be  found  subject  to  the 
restriction  that  the  design  should  make  it  possible  to  estimate  a  higher- 
degree  model  with  some  minimal  level  of  precision.  This  approach  leads  to  a 
range  of  designs  that  compromise  between  the  unrestricted  optimal  designs,  at 
one  extreme,  and  designs  that  provide  maximum  power  for  testing  that  the 
higher  degree  coefficients  are  all  zero,  at  the  other  extreme.  Studden  (1982) 
described  an  elegant  method  for  constructing  such  designs. 

Morris  and  Mitchell  (1983)  studied  the  special  case  of  designing 
two-level  multifactor  designs  that  are  sensitive  to  detecting  interactions 
among  the  factors.  They  recommended  a  sequential  approach  in  which  screening 
for  interactions  is  done  at  the  earliest  possible  stage  in  an  experiment  and 
then  forms  a  basis  for  planning  subsequent  runs.  They  proposed  a  design 
criterion  and  gave  rules  for  the  construction  of  designs  optimizing  the 
criterion.  Their  methods  make  it  possible  to  screen  for  interactions  with 
only  a  small  number  of  experimental  runs. 

Sometimes  the  degree  of  model  uncertainty  is  so  great  that  it  becomes 
impractical  to  pursue  the  methods  described  above.  The  task  of  specifying  all 
possible  models  or  classes  of  models  and  then  optimizing  a  design  criterion 
may  be  unmanageable  because  there  are  too  many  models  to  deal  with.  In  some 
applications  in  chemical  kinetics,  for  example,  more  than  100  candidate  models 
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can  be  listed,  all  of  them  nonlinear.  Sometimes  an  experimenter  is  cor,(ror.v;l 
with  the  opposite  situation:  it  is  difficult  to  specify  even  a  single 
model.  In  either  of  these  circumstances  it  may  be  best  to  proceed  by  putting 
forward  one  model  on  a  tentative  basis,  even  though  it  is  almost  certain  to  he 
incorrect  in  some  important  respects.  The  experimenter  might  then  wish  to 
employ  an  efficient  experimental  plan  whose  purpose  is  to  provide  data  that 
will,  with  the  greatest  sensitivity  possible,  reveal  the  inadequacies  of  the 
initial  model  so  that  it  can  be  modified  or  replaced  altogether.  A  desirable 
property  of  such  a  design  is  to  display  shortcomings  in  the  model  in  a  manner 
that  will  best  help  the  investigator  to  create  a  better  model.  The  procedure 
can  be  repeated  with  subsequent  models.  Unfortunately,  the  literature  or. 
useful  methods  of  model-building  along  this  line  is  limited;  see,  for  example, 
Box  and  Hunter  (1962),  Hunter  and  Mezaki  (1964),  Draper  and  Herzberg  (1971), 
and  Box  and  Draper  (1982). 

5.3  Error-Robust  Designs 

The  error  terms  in  (5.1)  are  typically  assumed  to  be  independent  and 
identically  distributed.  Further,  the  common  distribution  is  often  assumed  to 
be  a  normal  distribution.  This  section  will  review  research  in  design 
robustness  that  has  considered  violations  in  the  assumptions  concerning  the 
distribution  of  the  error  terms. 

Box  and  Draper  (1975)  and  Huber  (1975)  studied  the  possible  effects  of 
ouc^'srs  on  experimental  design  for  linear  models.  Huber  suggested  that 
resistance  to  outliers  in  the  error  distribution  could  be  achieved  by  avoiding 
outlying  points  in  the  experimental  design.  To  accomplish  the  latter  goal,  he 
recommended  using  designs  for  which  the  diagonal  elements  of  the  "hat"  matrix 
H  =  *(*•*)  ’x'  are  well  below  unity  (see  Hoaglin  and  Welsch  1978  for  a 
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discussion  of  the  "hat"  matrix  and  its  relationship  to  outliers).  Box  and 
Draper  (1975)  showed  that  the  effect  of  one  or  more  wild  observations  on  the 
vector  of  predicted  values  is  proportional  to  ),  h^,  where  h^  is  the  _ith 
diagonal  element  of  the  matrix  H  defined  above.  This  sum  is  minimized  if 
=  p/n  for  all  i,  so  that  Huber's  recommendation  may  be  interpreted  as 
adding  to  the  Box-Draper  criterion  a  requirement  that  p/n  not  be  too 
large.  Draper  and  Herzberg  (1979)  extended  the  work  of  Box  and  Draper  (1975) 
by  studying  the  effect  of  outliers  on  mean  squared  error  when  the  model  is  a 
polynomial  of  low  degree  but  is  subject  to  bias  from  terms  of  higher  degree. 

Herzberg  and  Andrews  (1976)  and  Andrews  and  Herzberg  (1979)  considered 
the  possibility  that  some  observations  would  be  missinq  altogether  or  would  be 
so  extreme  that  they  would  be  entirely  discarded  from  the  analysis.  They 
proposed  several  measures  of  robustness  against  such  occurrences,  such  as  the 
probability  that  the  "effective"  X  matrix  (i.e.,  the  X  matrix  for  the 
remaining  points)  would  not  have  full  rank  and  the  expected  value  of  the 
D-criterion  for  the  design,  where  the  relevant  probability  distribution  for 
these  calculations  is  that  which  specifies  the  probabilities  that  the  planned 
observations  will  actually  be  usable  in  subsequent  analysis.  They  found  that 
some  conventional  optimal  designs  are  not  robust  under  these  criteria,  which 
tend  to  favor  designs  with  some  repeated  points. 

Another  standard  assumption  is  that  the  error  terms  £ ^  are 
stochastically  independent.  Sacks  and  Ylvisaker  (1966)  studied  the  problem  of 
designing  regression  experiments  when  the  errors  are  correlated,  as  might 
happen  if  the  observations  are  realizations  of  a  time  series.  They  derived 
asymptotic  characterizations  of  optimal  designs,  which  call  for  taking  all  the 
observations  at  distinct  points.  These  designs  differ  from  the  optimal 
designs  derived  under  the  assumption  of  independence,  which  tend  to  replicate 


-26 


many  observations  at  a  small  number  of  design  [joints.  These  results  were 
generalized  by  the  authors  in  a  later  pa per  (Sacks  and  Ylvisaker  1968)  and  by 
Wahba  (1971).  Eubank,  Snith,  and  Smith  (1981,  1982)  have  proved  some 

uniqueness  results  for  these  desians. 

A  different  approach  to  the  problem  of  correlated  errors  has  been 
explored  by  Bickel  and  Herzberq  (1979)  and  Bickel,  Herzberg,  and  Schillinq 
(1981).  They  developed  asymptotic  theory  and  numerical  results  for  a  model  in 
which  the  extent  of  the  correlation  among  the  errors  is  assumed  to  decrease 
with  the  sample  size  (as  might  occur,  for  example,  if  additional  observations 
were  spread  over  a  wider  interval).  For  situations  in  which  the  errors  are 
assumed  to  follow  a  first-order  autoregressi ve  process,  the  authors  derived 
designs  for  estimating  location  and  for  fitting  simple  linear  regression 
models.  These  designs  are  described  exactly,  whereas  the  papers  mentioned  in 
the  previous  paragraph  gave  only  complicated  characterizations  of  designs. 

Another  approach  to  time  dependence  is  to  consider  the  possibility  that 
the  sequential  order  of  the  experimental  runs  will  affect  the  results  through 
a  polynomial  trend.  Such  a  trend  can  then  be  included  in  the  model  component 
of  (5.1),  rather  than  in  the  error  component,  and  it  is  possible  to  develop 
designs  that  are  orthogonal  to  the  trend.  This  approach  was  first  explored  by 
Daniel  and  Wilcoxon  (1966)  in  the  context  of  factorial  designs  and  was 
extended  by  Joiner  and  Campbell  (1976).  Bradley  and  Yeh  (1980)  developed  a 
theory  for  trend-free  block  designs. 


6.  RESPONSE  SURFACE  DESIGNS 

Response  surface  methodology  was  developed  by  Box  and  his  colleagues  at 
Imperial  Chemical  Industries  to  explore  relationships  such  as  those  between 
the  yield  of  a  chemical  process  and  the  pertinent  process  variables  (Box  and 
Wilson  1951,  Box  1954,  Box  and  Youle  1955).  In  its  usual  form,  response 
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surface  methodology  exploits  simple  empirical  models  such  as  low-degree 
polynomials  to  approximate  the  relationship  between  a  response  variable  and  a 
set  of  input  variables  over  a  current  region  of  interest. 

A  key  intellectual  insight  in  the  development  of  response  surface 
methodology  was  the  realization  that  in  chemistry,  engineering,  and  physics, 
experimental  data  are  often  available  for  analysis  much  more  rapidly  than  in 
agriculture.  Thus  an  efficient  way  to  organize  experimental  programs  in 
chemistry,  engineering,  and  physics  is  to  adopt  a  sequential  strategy  in  which 
the  experiment  proceeds  in  stages,  with  each  stage  designed  in  the  light  of 
results  obtained  from  earlier  runs.  The  classic  factorial  designs  formed  a 
basis  for  the  construction  of  the  design  at  each  stage,  but  typically  the 
designs  were  smaller  than  those  used  in  agriculture.  A  second  important 
insight  was  that  the  experimental  variables  in  the  chemical,  physical,  and 
engineering  sciences  are  frequently  quantitative  (continuous),  whereas  the 
variables  in  agricultural  experiments  are  often  qualitative  (categorical); 
this  led  to  the  useful  idea  of  rotatable  designs,  proposed  by  Box  and  Hunter 
(1957),  which  seeks  designs  for  which  the  variance  of  estimated  responses  is 
constant  on  spherical  shells  in  the  region  of  interest.  Once  these 
differences  had  been  recognized,  the  way  was  open  to  develop  new,  more 
efficient  experimental  design  strategies  that  took  advantage  of  them. 

Since  its  introduction  in  the  early  1950's,  response  surface  methodology 
has  become  an  accepted  and  widely  used  set  of  concepts  and  techniques. 

Chapter  11  of  Davies  (1954),  Chapter  8A  of  Cochran  and  Cox  (1957),  Chapter  10 
of  John  (1971),  and  the  book  by  Myers  (1976)  contain  explanations  of  the  basic 
ideas  of  response  surface  methodology,  including  both  the  design  of  response 
surface  experiments  and  the  estimation  and  interpretation  of  the  fitted 
surface.  An  introduction  to  the  subject  at  a  more  elementary  level  is  given 
in  Chapter  15  of  Box,  Hunter,  and  Hunter  (1978).  The  review  article  by  Hill 
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and  Hunter  (1966)  contains  over  one  hundred  references.  Finally,  a  definitive 
book  by  Box  and  Draper  (1984)  is  scheduled  for  publication. 

One  of  the  important  applications  of  response  surface  methods,  in  a 
simplified  form,  has  been  to  improve  the  performance  of  existing  industrial 
processes  by  systematically  varying  process  variables  and  gathering  data  while 
the  processes  operate,  without  upsetting  normal  production.  This  use  of 
response  surface  methods  for  process  improvement  is  known  as  evolutionary 
operation  (EVOP).  EVOP  is  an  aggressive  management  strategy  in  which  better 
ways  of  operating  a  process  are  actively  sought  rather  than  accidentally 
discovered.  Box  and  Draper  (1969)  explain  the  fundamental  principles  and 
methods  of  EVOP  and  discuss  how  an  EVOP  program  can  be  implemented.  Spendley, 
Hext,  and  Himsworth  (1962)  proposed  an  alternative  design  scheme,  known  as 
simplex  EVOP,  in  which  k  process  variables  are  placed  on  a  simplex  with 
k  +  1  vertices.  Hahn  and  Dershowitz  (1974)  discussed  some  of  the  practical 
issues  that  should  be  considered  in  using  EVOP  and  reported  the  results  of  a 
survey  indicating  that  EVOP  is  not  being  used  in  industry  as  widely  as  it 
could  be. 

Much  of  the  statistical  work  on  response  surface  design  in  recent  years 
has  concerned  the  use  of  optimal  design  theory  for  response  surface  studies. 
Some  authors  have  advocated  the  application  of  the  precepts  of  optimal  design 
theory  to  derive  response  surface  designs.  Others,  however,  have  questioned 
the  applicability  of  optimal  design  theory  to  response  surface  experiments. 
Typical  of  the  former  group  is  the  series  of  papers  by  Galil  and  Kiefer 
( 1977a ,b,  1979),  in  which  they  derived  optimal  designs  for  quadratic  and  cubic 
polynomial  response  surface  models  when  the  domain  of  the  predictor  variables 
is  assumed  to  be  a  k-dimensional  cube  or  sphere.  Designs  were  derived  for  a 
family  of  optimality  criteria  that  includes  A-,  D-,  and  E-optimality.  The 
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efficiency  of  the  designs  was  conpared  for  these  and  other  criteria  in  the 
family.  Peaotchinsky  (1978)  gave  expanded  results  for  quadratic  models. 

Lucas  (1976)  compared  a  variety  of  designs  for  quadratic  models  on  the  basis 
of  the  0-  and  6-optimality  criteria.  Some  of  the  computer-derived  optimal 
designs  reported  in  Section  4  (Mitchell  1974b,  Mitchell  and  Bayne  1978,  Galil 
and  Kiefer  1980b,  Welch  1982)  also  apply  to  response  surface  experiments. 

Criticism  of  the  use  of  optimal  design  theory  for  response  surface 
experiments  has  focused  on  several  issues.  One  of  the  major  arguments  against 
the  use  of  optimal  design  theory  is  the  need  to  specify  a  model  for  the 
response  function,  coupled  with  the  fact  that  optimal  designs  are  frequently 
quite  sensitive  to  the  form  of  the  model.  This  concern  stimulated  much  of  the 
research  recounted  in  Section  5  to  achieve  experimental  designs  that  are 
robust  to  the  choice  of  the  model.  Box  and  Draper  (1959)  and  Box  (1982) 
argued  that  to  assume  that  a  linear  model  such  as  (3.1)  exactly  represents  the 
true  response  function  is  especially  troubling  in  response  surface  studies, 
where  the  linear  model  is  never  intended  to  be  more  than  a  reasonable  local 
approximation.  Hence,  they  advised  that  the  possible  effects  of  bias  be 
considered  in  choosing  a  design. 

Box  (1982)  voiced  further  criticism  of  the  use  of  optimal  design  theory 
in  response  surface  studies.  Optimal  designs  for  a  particular  model  and 
criterion  are  found  by  optimizing  the  criterion  over  the  set  of  possible 
designs,  which  is  typically  defined  in  terms  of  a  prescribed  region  of 
experimentation,  within  which  the  predictor  variables  must  be  set.  A  common 
assumption  is  that  the  region  of  experimentation  is  a  simple  geometric  body, 
such  as  a  hypercube  or  hypersphere,  or  can  be  transformed  to  such  a  region  by 
centering  and  scaling  the  predictor  variables.  One  of  the  characteristics  of 
the  optimal  designs  found  in  the  papers  mentioned  above  is  that  many 


experimental  runs  are  placed  at  the  extreme  limits  of  the  region.  Box  (1982) 
observed  that,  in  most  response  surface  experiments,  the  region  of  possible 
experimentation  is  not  precisely  known;  moreover,  as  design  points  are  moved 
further  away  from  one  another,  the  effect  of  bias  on  any  simple  approximating 
model  is  likely  to  become  increasingly  severe.  Thus  the  tendency  of  the 
optimal  designs  to  concentrate  many  runs  at  the  extremes  of  the  design  region 
must  be  viewed  with  some  trepidation,  especially  in  the  context  of  response 
surface  experiments.  Similar  criticisms  were  also  stated  by  O’ Hagan  (1978) 
and  helped  stimulate  his  Bayesian  approach  to  design. 

Other  topics  related  to  the  design  of  response  surface  experiments  have 
recently  been  studied.  Draper  (1982)  discussed  several  methods  for  choosing 
the  number  of  center  points  in  designs  for  quadratic  response  surface 
models.  Hader  and  Park  (1978)  proposed  the  concept  of  slope-rotatable 
designs,  for  which  the  variance  of  the  first  partial  derivatives  of  the 
estimated  response  function  would  be  constant  on  spherical  shells  centered  at 
the  origin.  Slope-rotatability  might  be  a  desirable  property  if  the  primary 
purpose  of  the  experiment  is  to  estimate  the  slope  of  the  response  surface  and 
there  is  equal  interest  in  estimating  the  slope  in  all  directions  from  the 
center  of  the  experimental  region.  This  work  is  related  to  that  by  Box  and 
Hunter  (1957),  who  proposed  the  use  of  rotatable  designs,  which  have  the 
property  that  the  variance  of  the  estimated  response  is  constant  on  spherical 
shells.  Box  and  Draper  (1982)  discussed  several  measures  of  lack  of  fit  for 
response  surface  designs,  and  how  the  fit  might  be  improved  by  power 
transformations  of  the  predictor  variables.  Box  and  Draper  (1980)  discussed  a 
geometric  interpretation  for  the  variance  of  the  difference  between  two 
estimated  responses  and  gave  results  for  quadratic  and  cubic  rotatable 
designs. 


7.  MIXTURE  DESIGNS 


In  some  experimental  situations  the  response  depends  on  the  relative 
amounts  of  the  predictor  variables,  but  not  on  the  absolute  amounts.  Typical 
examples  would  be  car  mileage  as  a  function  of  the  proportions  of  components 
blended  into  gasoline  or  the  strength  of  an  alloy  as  a  function  of  the 
fractional  amounts  of  constituent  metals.  The  special  nature  of  these 
experiments,  known  as  mixture  experiments,  can  be  exprt-3:  ed  in  the  following 
set  of  constraints:  if  X^,....,^  denote  the  k  predictor  variables, 
measured  as  proportions,  then  for  each  experimental  run  we  must  have: 

k 

0  <  X  <  1  for  all  j,  and  \  X  -  1.  (7.1) 

3  j-1  3 

This  constraint  presents  some  special  problems  for  experimental  design  and 
statistical  modeling  because  any  model  which  contains  linear  terms  in  all  the 
predictor  variables  and  a  constant  term  will  be  overparameterized:  the  sum  of 
the  k  linear  coefficients  must  be  confounded  with  the  constant  term  due  to 
the  constraint  (7.1).  Although  the  particular  concern  of  mixture  experiments 
is  summarized  by  the  constraint  (7.1),  the  theory  can  be  applied  more 
generally  to  any  problem  in  which  there  exist  one  or  more  linear  constraints 
on  the  predictor  variables. 

A  fortuitous  circumstance  in  the  development  of  statistical  procedures 
for  mixture  experiments  was  Scheffe's  work  as  a  consultant  for  Chevron 
Research  Corporation.  Investigators  there  who  were  working  on  problems 
related  to  gasoline  blending  asked  him  for  advice  on  the  design  of  experiments 
in  which  the  relative  proportions  of  particular  formulations  were  to  be 
varied.  These  problems  stimulated  Schefffe  to  undertake  the  first  systematic 
statistical  study  of  mixture  experiments.  Scheffl  (1958)  introduced  a  family 
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of  models  for  mixture  problems  and  proposed  the  class  of  lattice  designs, 
which  place  experimental  runs  on  a  uniform  lattice  of  points,  enabling  the 
experimenter  to  explore  response  variables  throughout  the  entire  design 
simplex.  Scheffe  (1963)  proposed  simplex-centroid  designs,  in  which  runs  are 
made  using  mixtures  that  have  equal  proportions  of  some  subset  of  the 
components.  One  stimulus  for  the  simplex-centroid  designs  was  to  correct  a 
weakness  of  the  lattice  designs  —  their  tendency  to  use  many  experimental 
mixtures  that  involve  only  two  components,  even  when  the  number  of  components 
is  large. 

A  difficulty  encountered  in  many  mixture  experiments  is  that  some  of  the 
components  are  subject  to  upper  or  lower  bounds.  Such  bounds  can  produce 
design  regions  with  odd  shapes  for  which  it  is  impossible  to  use  the  designs 
mentioned  above.  McLean  and  Anderson  (1966)  proposed  solving  this  problem  by 
making  experimental  runs  at  the  extreme  points  and  various  centroids  of  the 
constrained  design  region.  These  plans  are  known  as  extreme  vertices  designs 
and,  as  with  Scheffe' s  designs,  they  allow  exploration  of  the  entire 
experimental  region. 

Much  of  the  subsequent  work  on  designs  for  mixture  experiments  has  roots 
in  the  models  and  designs  of  Scheffe  and  in  the  extreme  vertices  designs  of 
McLean  and  Anderson.  The  most  complete  reference  for  mixture  problems  is 
Cornell  (1981),  which  is  a  readable  introduction  to  the  topic  and  also 
discusses  much  of  the  most  recent  research.  Reviews  of  this  work  are  also 
available  (Cornell  1973,  1979).  Although  first  proposed  for  gasoline  blending 
experiments,  mixture  designs  have  been  applied  in  a  wide  variety  of 
situations.  We  believe  that,  in  the  future,  applications  of  mixture  designs 
will  extend  to  many  new  fields  as  more  experimenters  become  aware  of  their 


usefulness.  The  most  interesting  example  that  has  come  to  our  attention 


recently  is  the  use  of  mixture  designs  at  a  cooperative  in  Prance  for  blending 
different  wines  to  produce  a  table  wine.  Previously,  only  blends  that 
qualified  as  ordinary  table  wines  had  been  produced.  However,  the  designs 
succeeded  in  identifying  a  blend  that  received  the  higher  grade  of  vin 
delimite  de  gualite  superieure  ,  allowing  the  cooperative  to  sell  it  at  a 
premium  price.  In  the  remainder  of  this  section,  we  will  review  the  most 
recently  published  research  on  mixture  designs. 

The  development  of  computer  programs  to  assist  in  selecting  experimental 
runs  has  been  a  particular  concern  in  mixture  problems,  especially  when  there 
are  additional  bounds  on  some  of  the  components.  The  extreme  vertices  designs 
have  been  quite  popular  here,  so  that  most  computer  programs  proceed  in  two 
stages:  first,  the  extreme  vertices  and  centroids  of  the  constrained  design 
region  are  identified;  then,  a  design  optimality  criterion  is  used  to  select 
the  vertices  and  centroids  that  will  be  included  in  the  design.  For  specific 
references,  we  refer  the  reader  to  our  discussion  of  these  programs  in  Section 
4  on  computer-aided  design. 

A  common  situation  in  mixture  experiments  is  that,  in  addition  to  k 
mixture  variables,  there  are  some  process  variables  that  are  not  subject  to 
the  constraint  (7.1).  Experimental  designs  for  these  problems  must  specify 
settings  for  both  the  mixture  variables  and  the  process  variables.  Hare 
(1979)  generated  designs  by  restricting  the  mixture  variables  to  a  cuboidal 
subset  contained  within  the  simplex  defined  by  (7.1)  and  then  crossing  that 
subset  with  a  cuboidal  region  for  the  process  variables.  A  different  approach 
to  designing  experiments  with  both  mixture  and  process  variables  was  taken  by 
Vuchkov,  Damgaliev,  and  Yontchev  (1981).  They  used  a  sequential  procedure  to 
generate  quadratic  designs  with  high  efficiency  in  terms  of  the  D-optimality 
criterion. 
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When  the  experimenter  wishes  to  explore  only  a  limited  region  of  the 
design  simplex,  an  alternative  approach  that  we  feel  deserves  further 
attention  is  to  replace  the  k  linearly  dependent  mixture  components  by 
k  -  1  linear  functions  of  the  components,  often  called  pseudo-components.  By 
treating  the  pseudo-components  as  the  independent  variables  in  the  experiment, 
any  standard  response  surface  design  that  fits  inside  the  simplex  can  be 
used.  No  special  consideration  is  necessary  for  process  variables:  they  can 
be  included  as  additional  variables  in  the  response  surface  design.  Box  and 
Gardner  (1966)  proposed  a  similar  idea.  Their  projection  designs  were  defined 
by  taking  standard  designs  like  two-level  factorials  and  adjusting  them  to 
meet  one  or  more  linear  constraints  like  (7.1).  A  disadvantage  of  the 
projection  and  pseudo-component  designs  is  that  they  are  typically  not 
symmetric  with  respect  to  the  original  mixture  components. 

Cornell  and  Khuri  (1979)  proposed  designs  for  three-component  mixture 
models  with  the  property  that  the  variance  of  the  predicted  response  is 
constant  on  concentric  triangles  about  the  centroid  of  the  simplex.  This  idea 
is  analogous  to  the  concept  of  rotatable  designs  in  response  surface  studies 
(Box  and  Hunter  1957).  The  designs  are  constructed  by  performing  a  nonlinear 
transformation  of  the  coordinate  system  and  then  applying  the  theory  of 
rotatable  designs.  Piepel  (1983)  offered  guidelines  to  check  the  consistency 
of  linear  constraints  used  to  restrict  the  region  of  experimentation. 

8.  FACTORIAL  DESIGNS 

Factorial  designs,  first  developed  by  Fisher  and  Yates  at  Rothamsted,  are 
one  of  the  major  contributions  of  statistical  insight  into  experimental 


design.  Their  essential  feature,  the  simultaneous  study  of  several  factors, 
is  a  marked  departure  from  the  common  idea  that  experimenters  should  vary  only 


one  factor  at  a  time.  As  Fisher  (1926)  observed,  factorial  designs  offer  many 
advantages:  each  experimental  run  gives  information  on  several  factors,  not 
just  one;  the  experiment  yields  as  much  Information  about  each  factor  as 
though  it  alone  had  been  varied;  valuable  additional  information  is  available 
through  the  ability  to  check  for  possible  interactions  among  the  factors;  and 
in  the  event  that  no  interactions  are  found,  there  is  a  much  broader  base  for 
generalizing  conclusions  on  the  main  effect  of  a  factor,  since  the  effect  has 
been  observed  in  a  variety  of  experimental  conditions. 

A  further  advance  was  the  introduction  by  Finney  (1945)  of  fractional 
factorial  designs.  These  designs  allow  experimenters  to  study  the  main 
effects  and  low-order  interactions  of  several  factors  in  far  fewer  runs  than 
required  to  complete  the  full  factorial  designs  by  sacrificing  the  ability  to 
estimate  high-order  interactions.  Fractional  factorial  designs  thus  offer 
great  economy  of  time  and  resources  when,  as  is  often  the  case,  high-order 
interactions  are  negligible.  Plackett  and  Burman  (1946)  described  a  useful 
class  of  highly  fractionated  orthogonal  designs,  in  which  the  main  effects 
of  n  -  1  two-level  factors  are  estimated  using  just  n  runs.  Box  and 
Hunter  (196 la, b)  described  in  detail  the  theory  and  application  of  2^  P 
fractional  factorial  designs.  For  experiments  in  which  some  factors  are  used 
at  more  levels  than  others,  Addelman  and  Kempthorne  (1961)  and  Addelman  (1962) 
presented  a  simple  technique  for  deriving  designs  that  give  orthogonal 
estimates  of  main  effects.  The  important  contributions  that  factorial  and 
fractional  factorial  designs  can  make  to  experimentation  in  the  chemical, 
physical,  and  engineering  sciences  were  clearly  evident  to  the  initial  editors 
of  TechnometricB :  many  articles  described  these  designs  and  illustrated  their 
usefulness. 
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The  most  commonly  used  factorial  designs  can  be  found  in  most  books  on 
experimental  design.  John  (1971)  is  an  excellent  source  for  the  factorial 
designs  used  most  often  in  practice:  two-  and  three-level  factorials  and 
fractional  factorials,  including  Plackett-Burman  designs,  main  effect  plans, 
and  some  asymmetric  factorials  (i.e.,  designs  in  which  not  all  factors  have 
the  same  number  of  levels).  John's  book  is  directed  toward  readers  with  some 
mathematical  and  statistical  sophistication.  Daniel  (1976)  also  presents  many 
useful  factorial  plans  and  describes  a  number  of  interesting  applications;  the 
level  is  less  theoretical  than  John  (1971),  but  some  background  in  statistics 
is  necessary.  At  a  more  elementary  level,  the  book  by  Box,  Hunter,  and  Hunter 
(1978)  describes  two-level  factorial  and  fractional  factorial  designs  in 
considerable  detail.  A  good  source  for  asymmetric  factorial  plans  is  the  book 
by  Cochran  and  Cox  (1957),  which  also  covers  the  topics  mentioned  immediately 
above,  although  with  a  bias  toward  agricultural  examples  and  terminology. 
Davies  (1954)  also  lists  many  useful  factorial  designs.  Raktoe,  Hedayat,  and 
Federer  (1981)  is  a  concise  but  comprehensive  treatise  on  the  mathematical 
theory  underlying  factorial  designs,  with  only  a  limited  emphasis  on 
applications. 

Recent  research  on  factorial  designs  has  considered  several  problems, 
including  incomplete  factorials,  weighing  designs,  screening  designs, 
asymmetric  factorials,  and  blocking  schemes.  A  brief  review  follows. 

John  (1979)  and  Smith  and  Schmoyer  (1982)  both  considered  the  effect  on 
two-level  factorial  designs  of  incomplete  replication.  John  showed  that 
losing  a  single  observation  from  a  2*  factorial  experiment  could  double  the 
variance  of  some  of  the  estimated  factor  effects.  He  also  examined  the  effect 
of  missing  observations  on  design  resolution  for  2^“P  experiments.  Smith 
and  Schmoyer  investigated  the  consequences  for  a  two-level  factorial  of 
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terminating  the  experiment  prior  to  completing  all  2*t~P  runs  in  the  original 
plan.  Such  a  situation  might  arise  due  to  equipment  failure  or  to  a  conscious 
decision  to  cease  experimentation,  and  it  is  especially  relevant  to  the 
physical  sciences,  where  experiments  are  often  run  sequentially  (as  opposed  to 
the  simultaneous  experimentation  more  common  in  agriculture).  They  considered 
two  strategies:  augmenting  the  best  main  effects  plan  for  k  factors  run  by 
run  and  deleting  runs  one  by  one  from  the  complete  design.  In  both  cases,  the 
run  added  or  deleted  is  done  so  on  the  basis  of  D-optimality . 

Fries  and  Hunter  (1980)  proposed  the  concept  of  minimum  aberration  to 
compare  2*C-P  designs  of  equal  resolution.  This  concept  generalizes  the 
notion  of  design  resolution,  which  characterizes  factorial  designs  by  stating 
what  high-order  interactions  need  all  be  negligible  in  order  to  assure  that 
all  main  effects  and  low-order  interactions  can  be  estimated.  (For  example,  a 
design  is  said  to  have  resolution  III  if  all  the  main  effects  can  be 
estimated,  provided  that  all  of  the  interaction  terms  are  negligible.)  Fries 
and  Hunter  defined  the  aberration  of  a  design  as  the  number  of  words  of 
minimal  length  in  the  defining  relation  for  the  design  and  gave  examples  in 
which  this  could  be  used  to  compare  designs  of  equal  resolution. 

Srivastava  and  Gupta  (1979)  considered  the  use  of  resolution  III  2k~P 
designs  when  some  of  the  interaction  terms  are  present.  In  particular,  they 
proposed  designs  that  allow  for  the  detection  and  estimation  of  an  interaction 
term,  assuming  that  no  more  than  one  interaction  is  nonnegligible. 

Galil  and  Kiefer  (1980a,  1982)  thoroughly  studied  the  problem  of 
D-optimal  design  for  weighing  experiments  and  gave  extensive  tables  of  the 
known  D-optimal  designs.  The  objective  of  weighing  experiments  is  to 
determine  the  individual  weights  of  k  objects  in  n  weighings.  For  each 
weighing,  each  object  must  be  placed  in  the  right  pan  of  the  scale,  in  the 


left  pan,  or  not  weighed.  By  identifying  the  right  and  left  pans  of  the  scale 
with  the  two  levels  of  a  factor,  the  weighing  design  model  can  he  seen  as  a 
general  paradigm  for  two-level  factorial  experiments  in  n  runs;  in 
particular,  the  2k_P  experiments  are  a  special  subset  of  the  general 
weighing  design  problem.  Galil  and  Kiefer  creatively  combined  theoretical 
calculations  with  computer  search  to  prove  the  D-optimality  of  some  previously 
suggested  designs  and  to  derive  new  D-optimal  weighing  designs.  Special 
attention  was  given  to  the  most  difficult  case:  n  =  3  (mod  4).  Cheng  (1980b) 
showed  that  certain  weighing  designs,  including  fractional  factorials,  are 
optimal  with  respect  to  a  very  general  class  of  criteria. 

Group  screening  designs  are  useful  when  a  large  number  of  factors  must  be 
considered  and  it  is  desired  to  find  the  most  important  factors  with  a  minimum 
of  experimental  runs.  Mauro  and  Smith  (1982)  investigated  the  efficiency  of 
two-stage  group  screening  designs  in  which  potentially  similar  factors  are 
treated  as  a  single  factor  and  varied  in  unison  during  a  first  stage 
experiment;  a  second  experiment  then  studies  the  significant  factor  groups  in 
detail.  Mauro  and  Smith  found  that  these  designs  performed  quite  well,  both 
in  terms  of  identifying  significant  effects  and  minimizing  the  number  of  runs, 
even  when  the  initial  grouping  is  based  on  little  prior  knowledge. 

Several  authors  have  described  low-resolution  plans  for  other  factorial 
designs.  Anderson  and  Thomas  (1979)  gave  resolution  IV  designs  for  s 
factorials,  where  s  is  a  power  of  a  prime.  The  designs  require  s(s-1)k 

If 

runs,  which  is  near  the  theoretical  lower  bound  for  an  s  experiment  to  have 

resolution  IV.  Chacko,  Dey,  and  Ramakrishna  (1979)  derived  orthogonal  main 
3  k 

effect  plans  for  4  2  experiments  and  showed  how  they  could  also  be  used  to 
construct  orthogonal  main  effect  plans  for  4r3s2*  experiments  when 
2  <  r  +  s  <  3.  Gupta,  Nigam,  and  Dey  (1982)  derived  orthogonal  main  effect 
plans  for  tsk  factorial  experiments. 


Cyclic  designs  have  proven  to  be  a  useful  method  to  generate  blocking 
schemes  for  general  factorial  designs.  These  designs  exploit  the  theory  of 
cyclic  groups  and  are  quite  easy  to  construct.  The  construction  and  analysis 
of  cyclic  designs  for  symmetric  factorials  was  described  in  John  and  Dean 
( 1 97 5 ) i  Dean  and  John  (1975)  extended  the  theory  to  asymmetrical  factorials. 
The  latter  article  also  listed  designs  for  various  factor  combinations  and 
blocking  patterns.  John,  Wolock,  and  David  (1972)  presented  an  extensive 
catalog  of  cyclic  designs.  John  (1981)  gave  a  concise  list  of  efficient 
cyclic  designs. 


9.  BLOCK  DESIGNS 

Block  designs  epitomize  one  of  Fisher's  basic  concepts  of  the  statistical 
design  of  experiments:  the  importance  of  setting  off  experimental  runs  into 
small  groups  (blocks)  that  are  highly  homogeneous,  in  order  to  increase  the 
precision  of  the  experiment.  Classical  block  designs  are  intended  for 
experiments  with  a  single  factor  that  has  many  levels,  unlike  the  factorial 
experiments  described  in  Section  8,  which  involve  many  factors,  usually  at 
only  two  or  three  levels.  When  there  is  only  one  factor,  its  levelB  are 
usually  referred  to  as  treatments  (or  varieties)  and  the  principal  goal  of  the 
experiment  usually  involves  comparison  of  the  treatments.  The  purpose  of  the 
blocking  scheme,  then,  is  to  increase  the  precision  of  comparisons  among  the 
different  treatments. 

The  classic  blocking  plans  are  randomized  block  designs  (for  blocking  a 
single  factor),  Latin  Squares  and  their  generalizations  (for  blocking  several 
factors  simultaneously),  and  incomplete  block  designs  (when  the  number  of 
treatments  exceeds  the  number  of  experimental  units  in  each  block).  Detailed 
descriptions  of  these  and  other  blocking  schemes  are  availah1 ®  in  many  books 
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on  experimental  design.  In  particular,  the  books  by  Cochran  and  Cox  (1957) 
and  Kempthorne  (1952)  are  good  sources;  both  books  list  many  designs.  John 
(1971)  and  Davies  (1954)  also  describe  many  useful  block  designs. 

Block  designb  have  been  the  subject  of  much  recent  statistical 
research.  In  particular,  recent  work  has  focused  on  the  application  of 
optimal  design  theory  to  block  designs.  This  area  is  especially  conducive  to 
optimal  design  theory  because,  for  many  block  designs,  a  linear  statistical 
model  and  a  precise  design  region  can  be  clearly  stated.  Thus  the  criticisms 
surrounding  the  application  of  optimal  design  theory  to  response  surface 
studies  (see  Section  6)  are  not  serious  problems  for  block  designs.  The 
remainder  of  this  section  will  review  some  recent  results. 

One  problem  that  has  attracted  considerable  attention  is  the  design  of 
unbalanced  incomplete  block  designs.  The  most  efficient  incomplete  block 
designs  for  studying  v  treatments  in  b  blocks  of  k  units  each  are 
balanced  incomplete  block  designs,  in  which  each  pair  of  treatments  occurs 
jointly  in  the  same  number  of  blocks.  Not  all  combinations  of  v,  b,  and  k, 
however,  permit  the  construction  of  a  balanced  design.  To  aid  in  finding  good 
incomplete  block  designs  when  no  balanced  design  exists,  John  and  Mitchell 
(1977)  introduced  the  concept  of  regular  graph  designs.  These  are  incomplete 
block  designs  in  which  each  pair  of  treatments  occurs  jointly  in  either 
X.j  or  X2  blocks,  where  X2  “  X1  +  1.  John  and  Mitchell  showed  that  these 
designs  are  related  to  a  regular  graph  with  v  nodes  and  used  graph-theoretic 
methods  to  study  their  properties.  They  proved  that  many  of  the  regular  graph 
designs  possess  optimality  properties.  Cheng  (1978a)  showed  that  regular 
graph  designs  are  optimal  with  respect  to  a  large  class  of  optimality 
criteria,  and  Chang  (1980a)  and  Jacroux  (1980)  gave  conditions  for  the 
existence  of  E-optimal  regular  graph  designs.  Cheng  and  Gray  (1980)  showed 
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that  some  special  types  of  regular  graph  designs  are  also  group  divisible. 
Cheng  and  Wu  (1981)  extended  the  notion  of  regular  graph  designs  to  include 
experiments  in  which  the  treatments  are  not  equally  replicated. 

Jacroux  (1982,  1983)  investigated  incomplete  block  designs  for  which  the 
treatments  were  not  equally  replicated  and  derived  some  sufficient  conditions 
for  such  designs  to  be  E-optimal.  Results  on  the  E-optimality  of  some 
balanced  and  partially  balanced  incomplete  block  designs  were  also  given  by 
Constantine  (1981,  1 982 ) . 

Hall  and  Jarrett  (1981)  gave  tables  of  incomplete  block  designs  for 
experiments  with  many  treatments  (10  <  v  <  60)  but  no  more  than  5  replicates 
per  treatment  and  block  sizes  of  at  most  10.  John  (1978)  described  a  new 
balanced  incomplete  block  design  for  v  *  18  treatments  in  b  *  51  blocks, 
with  six  runs  per  block  and  17  replicates  of  each  treatment.  The  design  is 
resolvable  and  can  be  split  into  useful  partially  balanced  subdesigns. 

Other  research  work  has  studied  optimality  properties  of  designs  that 
simultaneously  block  several  factors.  Kiefer  (1975)  showed  with  an  elegant 
proof  that  generalized  Youden  designs  for  simultaneous  blocking  of  two  sources 
of  variation  are  optimal  with  respect  to  a  large  class  of  optimality 
criteria.  Jacroux  (1982)  gave  E-optimal  designs  for  two-way  blocking  for 
eiqperiments  with  unequally  replicated  treatments.  Cheng  (1978b)  defined 
Youden  hyperrectangles,  which  are  higher-dimensional  generalizations  of 
generalized  Youden  designs  and  balanced  block  designs  that  allow  for  blocking 
many  sources  of  variation,  and  proved  various  optimality  properties  for  these 
designs.  Cheng  (1979)  gave  methods  for  their  construction.  Cheng  (1981) 
showed  that  in  some  cases  the  optimality  properties  of  generalized  Youden 
designs  also  hold  for  a  less  restrictive  class  of  designs.  He  called  these 
"pseudo-Youden"  designs  and  gave  suggestions  on  how  to  construct  them. 
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The  blocking  schemes  described  thus  far  assume  that  the  factors  to  be 
blocked  in  an  experiment  to  compare  treatments  are  categorical  variables. 
Often,  however,  important  concomitant  variables  are  continuous  in  nature. 
Harville  (1974,  1975)  and  Cook  and  Thibodeau  (1980)  have  studied  the  optimal 
allocation  of  experimental  units  to  different  treatments  when  there  is 
covariate  information  available  for  each  unit  at  the  time  of  assignment. 

Several  authors  have  studied  the  problem  of  block  designs  for  experiments 
where  the  observations  may  be  subject  to  a  correlated  error  structure.  This 
problem  has  attracted  attention  primarily  in  agricultural  experimentation, 
where  observations  from  physically  adjacent  plots  may  be  correlated,  but  is 
applicable  in  a  broad  range  of  situations.  When  such  plot-to-plot  effects  are 
non-directional  (i.e.,  the  errors  for  two  neighboring  plots  both  affect  each 
other).  Freeman  (1981)  recommended  the  use  of  quasi-complete  Latin  squares, 
which  are  Latin  squares  with  the  property  that  every  unordered  pair  of 
elements  occurs  adjacently  twice  in  rows  and  twice  in  columns.  Sonneman 
(1982)  studied  the  case  when  the  plot-to-plot  effects  are  directional  (i.e., 
the  error  for  plot  i  affects  the  error  for  plot  i  +  1,  but  there  is  no 
effect  in  the  other  direction),  as  might  occur  in  a  repeated  measurement 
experiment.  He  proved  that  complete  Latin  squares,  in  which  every  ordered 
pair  of  elements  occurs  adjacently  once  in  rows  and  once  in  columns,  are 
D-optimal.  Martin  (1982)  presented  regular  and  treatment-balanced  designs  for 
arranging  treatments  on  a  torus  when  the  correlations  are  assumed  to  follow  a 
second-order  stationary  lattice  process. 

Kiefer  and  Wynn  (1981)  proposed  a  two-stage  design  strategy  for  block 
designs  with  correlated  error  structures:  first,  limit  consideration  to  a 
class  of  designs  known  to  be  efficient  in  the  absence  of  correlation  (such  as 
balanced  incong>lete  block  designs);  then,  choose  a  design  from  within  that 
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class  that  offers  some  protection  against  possible  correlation.  They 
considered  a  "nearest  neighbor"  correlation  structure  and  defined  the  class  of 
equine! ghborhood  designs,  which  involve  restrictions  on  the  number  of  times 
pairs  of  treatments  can  be  adjacent  to  one  another.  Cheng  (1983)  presented 
methods  for  constructing  such  designs. 

Bechhofer  and  Tamhane  (1981)  studied  designs  for  experiments  to  compare 
v  -  1  test  treatments  with  a  control  treatment.  They  introduced  the  concept 
of  balanced  treatment  incomplete  block  (BTIB)  designs,  which  are  symmetric 
with  respect  to  the  test  treatments  and  between  each  test  treatment  and  the 
control.  However,  the  control  may  be  replicated  more  often  than  the  test 
treatments.  Bechhofer  and  Tamhane  (1983)  further  studied  these  designs  and 
gave  tables  of  optimal  allocations  of  runs  among  v  treatments  in  order  to 
make  one-  or  two-sided  confidence  statements.  Majumdar  and  Notz  (1983) 
derived  designs  for  this  problem  with  respect  to  a  variety  of  optimality 
criteria.  Most  of  their  designs  belonged  to  the  class  of  BTIB  designs  defined 
by  Bechhofer  and  Tamhane.  Constantine  (1983)  showed  that  a  simple  way  to 
generate  a  design  which  minimizes  the  average  variance  of  the  treatment- 
control  comparisons  is  to  reinforce  a  balanced  incomplete  block  design  (for 
the  v  -  1  test  treatments)  by  adding  the  control  treatment  to  each  block. 
Again,  this  is  a  BTIB  design. 


10.  NONLINEAR  MODELS 

Nonlinear  models  play  an  important  role  in  describing  physical,  chemical, 
and  engineering  systems.  By  nonlinear  models,  we  refer  to  situations  in  which 
the  response  from  the  ^th  experimental  run  is  described  by  the  model: 

Yi  =  0(3^6)  +  (10.1) 

where  the  response  function  n  is  a  nonlinear  function  of  the  parameter 
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vector  6.  Nonlinear  models  typically  arise  when  the  tesearcher  has  in  mind  a 
theory  that  describes  the  effect  of  the  predictors  on  the  observed 

response  Y^.  Even  if  the  theory  is  incomplete,  the  nonlinear  model  may  be 
more  useful  than  a  competing  empirical  model  as  a  first  approximation  because 
it  more  effectively  captures  the  main  features  of  the  data.  It  is  often 
possible  to  obtain  more  accurate  and  more  parsimonious  models  by  exploiting 
the  researcher's  scientific  knowledge  to  suggest  a  nonlinear  model. 

Special  design  problems  arise  for  nonlinear  models  because  the  best 
design  depends,  in  general,  on  the  unknown  parameter  values.  Investigators 
are  thus  in  the  rather  paradoxical  position  of  having  to  know  at  the  design 
stage  the  very  quantities  that  they  are  conducting  the  experiment  to 
estimate)  Two  reviews  of  work  on  nonlinear  models,  including  experimental 
design,  are  Cochran  (1973)  and  Bates  and  Hunter  (1984). 

Fisher  (1922)  was  perhaps  the  first  statistician  to  study  experimental 
design  for  a  nonlinear  model.  He  considered  the  problem  of  designing 
experiments  for  the  estimation  of  the  density  of  small  organisms  in  a  liquid 
by  means  of  a  series  of  dilutions.  Box  and  Lucas  (1959),  in  a  pioneering 
paper  on  experimental  design  for  nonlinear  models,  showed  how  the  D-optimality 
criterion  could  be  applied  by  working  with  a  linearized  approximation  to  the 
nonlinear  model  and  using  the  experimenter's  initial  guesses  as  to  the  likely 
values  of  the  parameters.  Box  and  Hunter  ( 1965a, 1965b)  advocated  a  sequential 
strategy,  in  which  the  parameter  estimates  are  updated  after  each  trial  and 
the  next  design  point  is  then  chosen  with  the  aid  of  the  improved  estimates. 
Hill  (1980)  showed  that  if  a  nonlinear  model  is  linear  in  some  of  the 
parameters,  then  the  D-optimal  design  does  not  depend  on  the  value  of  the 
linear  parameters. 


-45- 


In  designing  experiments  to  discriminate  among  several  conjectured 
models,  which  was  discussed  in  Section  5,  special  attention  has  been  paid  to 


the  case  of  nonlinear  models.  This  problem  typically  arises  when  there  are 
competing  theories  to  explain  the  effect  of  the  predictors  on  the  response, 
each  of  which  implies  a  different  model  function  n.  Experiments  are  then 
desired  that  can  discriminate  among  the  models,  and  so  suggest  which  of  the 
proposed  theories  seems  to  be  the  most  valid.  See  Hunter  and  Reiner  (1965), 
Box  and  Hill  (1967),  Atkinson  and  Cox  (1974),  Atkinson  and  Fedorov  (1975a, b), 
and  Atkinson  ( 1981 ) . 

Other  research  work  has  considered  experimental  design  for  particular 
types  of  nonlinear  models.  For  example,  Currie  (1982)  compared  different 
designs  for  estimating  the  parameters  of  the  Michaelis-Menten  equation,  which 
is  often  used  to  model  enzyme  kinetics. 

A  number  of  researchers  have  studied  the  design  of  efficient  experiments 
for  quantal  response  data,  in  which  the  probability  of  observing  a  response  is 
assumed  to  be  a  function  (typically  nonlinear)  of  some  underlying  variables, 
such  as  dose  or  stress.  A  common  goal  of  quantal  response  experiments  is  to 
estimate  a  stress  at  which  the  probability  of  observing  a  response  obtains  a 
pre-apecified  level,  such  as  .5.  Robbins  and  Monro  (1951)  proposed  a 
sequential  design  scheme  (known  as  stochastic  approximation)  for  this  problem, 
in  which  the  stress  for  each  experimental  run  is  determined  by  the  stress  and 
the  outcome  of  the  previous  run.  Much  subsequent  work  has  continued  to 
exploit  a  sequential  approach  (see,  for  example,  Wetherill  1963,  Tsutakawa 
1972,  Chernoff  1975,  Owen  1975,  and  Anbar  1978).  Other  authors  have  developed 
non-sequential  design  schemes  for  quantal  response  experiments.  Meeker  and 
Hahn  (1977)  proposed  experimental  designs  to  estimate  the  probability  of 
response  at  a  specified  stress,  when  it  is  assumed  that  the  probability  at 


-46- 


that  stress  is  close  to  zero  or  one,  and  that  the  probability  of  response  can 


be  accurately  represented  by  a  logistic  regression  model.  Abdelbasit  and 
Plackett  (1983)  also  considered  logistic  regression  models  and  derived  designs 
that  maximize  information  on  the  parameters  in  the  model.  Maxim,  Hendrickson, 
and  Cullen  (1977)  proposed  designs  for  experiments  with  two  stress  variables. 


11.  FUTURE  DIRECTIONS 

In  the  preceding  sections  we  have  reviewed  statistical  research  on 
experimental  design;  in  this  section  we  discuss  some  areas  that  we  think 
deserve  attention  in  the  years  ahead.  We  will  begin  our  discussion  with  some 
areas  that  are  natural  outgrowths  of  the  recent  efforts  in  experimental  design 
that  were  discussed  in  the  preceding  sections.  Then,  in  individual 
subsections,  we  will  discuss  some  other  areas  that  have  not  been  widely 
explored:  designs  for  sequential  experimentation,  considering  multiple  design 
objectives,  planning  experiments  in  the  real  world,  education,  and  interactive 
computer  programs  for  designing  experiments. 

The  increasing  awareness  of  the  importance  of  assumptions  upon  which 
statistical  methods  are  based  has  led  to  much  useful  research  in  design 
robustness  (see  Section  5).  We  believe  that  the  problem  of  designing 
experiments  that  will  not  be  overly  sensitive  to  assumptions  should  be  a  key 
concern  or  statisticians  in  coming  years. 

Factorial  designs  have  traditionally  been  used  to  study  a  relatively 
small  number  of  factors.  In  many  experiments,  however,  a  large  number  of 
factors  (perhaps  50  or  100)  may  be  initially  suspected  to  be  important.  The 
use  of  highly  saturated  or  even  super-saturated  factorial  designs  to  study 
such  systems  is  a  problem  that  should  be  studied  further.  It  has  been 
reported  that  in  Japan  experiments  with  more  than  100  process  variables  have 
been  successfully  performed  in  industry.  Especially  influential  in  Japan  have 


been  Taguchi's  ideas  on  orthogonal  arrays  (see  Taguchi  and  Wu  1979,  Phadke 
1982).  Another  direction  worthy  of  consideration,  suggested  by  Tukey,  is  the 
use  of  designs  that  are  not  orthogonal,  but  in  which  the  correlations  of  the 
parameter  estimates  are  quite  small.  The  idea  here  is  that  by  sacrificing 
some  orthogonality,  it  may  be  possible  to  gain  much  in  terms  of  the  number  of 
factors  that  can  be  studied. 

The  design  of  experiments  for  mixture  problems  is  likely  to  remain  a 
topic  of  considerable  interest.  Some  of  the  particular  questions  that  should 
stimulate  more  research  are  designs  to  combine  both  mixture  and  process 
variables,  designs  to  study  only  a  limited  region  in  the  mixture  simplex,  and 
computer  algorithms  for  design,  especially  when  additional  constraints  on  the 
mixture  components  yield  a  complicated  design  region. 

The  study  of  experimental  design  for  nonlinear  models  has  lagged  far 
behind  the  research  devoted  to  experimental  design  for  linear  models.  One 
reason  for  this  scarcity  of  work  is  the  inherent  difficulty,  discussed  in 
Section  10,  that  designs  generally  depend  on  the  unknown  parameter  values. 
None  the  less,  nonlinear  models  are  valuable  tools  for  studying  processes  in 
the  chemical,  physical,  and  engineering  sciences;  more  research  on  designing 
experiments  for  nonlinear  models  should  certainly  be  undertaken.  One 
interesting  question  that  has  not  been  studied  is  the  design  of  experiments 
for  nonlinear  models  that  are  proposed  as  tentative  empirical 
approximations.  A  good  design  should  then  allow  for  estimation  of  the 
proposed  model,  and  should  also  provide  a  basis  for  suggesting  modifications 
to  the  model  so  that  it  will  more  accurately  represent  the  process  under 
study.  Another  problem  that  deserves  further  attention  is  the  link  between 
empirical  models  and  underlying  nonlinear  mechanisms.  This  possibility  was 
first  noted  by  Box  and  Youle  (1955),  who  found  that  a  fitted  response  surface 
model  for  a  chemical  experiment  suggested  a  theoretical  nonlinear  model. 
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11.1  Designs  for  Sequential  Experimentation 

Box  and  Youle  (1955)  described  the  iterative  nature  of  experimentation  in 
terms  of  a  cycle  that  may  be  repeated  many  times  in  the  course  of  an 
investigation.  It  consists  of  four  steps:  conjecture  (experimenter 
formulates  an  idea,  hypothesis,  model,  theory),  design  (experimenter  plans  the 
experiment),  experiment  (experimenter  collects  the  data),  and  analysis 
(experimenter  extracts  useful  information  from  the  data).  The  analysis  will 
frequently  cause  the  experimenter  to  modify  the  original  conjecture,  or  even 
to  completely  abandon  it,  in  favor  of  a  better  conjecture.  A  new  cycle  then 
begins.  In  the  chemical,  physical,  and  engineering  sciences,  the  time  to 
complete  a  cycle  is  most  often  much  less  than  that  required  in  agricultural 
research,  where  the  basic  concepts  of  experimental  design  originated.  Many 
new  design  possibilities  can  thus  be  exploited,  since  the  results  of  previous 
experiments  are  available  to  aid  in  the  planning  of  future  experiments ,- 
however,  new  problems  arise  on  which  some  research  has  been  done,  but  more 
work  is  definitely  in  order. 

Response  surface  methodology  has  always  stressed  a  sequential  approach, 
as  is  illustrated  by  the  discussion  in  Box,  Hunter,  and  Hunter  (1978,  Chapter 
15).  Even  in  response  surface  studies,  however,  the  experimental  plan 
typically  proceeds  by  stages  and  when  stages  involve  many  runs  (as  is  likely 
if  many  factors  are  involved),  useful  methods  might  be  proposed  to  further 
decompose  each  stage.  For  example,  the  first  runs  of  a  stage  might  indicate 
that  another  region  in  the  factor  space  is  more  interesting  than  the  region 
currently  being  explored,  that  unexpected  interactions  or  other  complications 
are  present,  that  some  transformation  of  the  predictor  variables  is  called 
for,  or  that  unexpected  simplifications  seem  to  be  possible  (e.g.,  one  or  more 
factors  are  inert,  or  a  simpler  model  form  is  appropriate  —  perhaps,  though 
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not  necessarily,  after  transformation).  In  such  cases,  it  would  be  useful  to 
design  the  experiment  in  such  a  way  that  initial  plans  could  be  modified  well 
short  of  completion. 

Some  authors  have  considered  methods  to  break  down  experiments  into 
smaller  pieces.  Box  (1982)  showed  how  blocking  strategies  could  be  used  to 
construct  a  sequential  scheme  for  experimentation.  Another  strategy  that 
might  be  useful  here  is  to  employ  the  3/4  factorial  designs  introduced  by 
John  (1962).  Daniel  (1973)  studied  the  "one-at-a-time"  approach,  in  which 
experimental  runs  are  added  sequentially  from  an  overall  factorial  design  to 
create  useful  designs  at  each  step.  A  similar  idea  was  proposed  by  Smith  and 
Schmoyer  (1982)  (see  Sect.  8).  A  related  problem  arises  when  it  is  difficult 
or  expensive  to  alter  the  settings  of  some  factors  in  an  experiment.  Draper 
and  Stoneman  (1968)  and  Dickinson  (1974)  have  studied  the  problem  of  designing 
factorial  experiments  when  it  is  desired  to  minimize  the  number  of  changes  in 
factor  settings. 

It  has  been  suggested,  with  some  irony,  that  the  best  time  to  design  an 
experiment  is  after  the  experiment  has  been  completed  because  one  then  has 
more  knowledge  of  the  process  under  study  —  what  variables  are  important, 
over  what  ranges,  in  what  metrics,  and  so  on.  By  designing  experiments 
sequentially,  we  can,  in  a  sense,  approximate  this  happy  (but  impossible) 
situation  by  "peeking"  at  the  answer  and  modifying  the  design  accordingly. 

Such  a  sequential  approach  would  be  optimal  in  the  sense  that  the  planning  of 
each  experimental  run  takes  into  account  all  the  information  available  up  to 
the  time  it  is  performed.  However,  using  one  of  the  standard  design  criteria, 
in  which  the  settings  for  each  run  are  precisely  specified,  the  investigator 
would  lose  the  benefits  of  randomization.  In  general,  the  consequences  of 
such  a  loss  are  not  known  and  might  be  a  rewarding  topic  for  future  research. 
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11.2  Considering  Multiple  Design  Objectives 


Box  and  Draper  (1975)  listed  14  different  goals  which  might  be  important 
in  designing  a  response  surface  experiment.  Additional  goals  were  listed  by 
Herzberg  (1980).  Most  of  the  goals  in  those  lists  are  potentially  important 
in  almost  any  experiment.  And  the  lists  are  certainly  not  exhaustive. 
Experimenters'  purposes  are  complex  and  often  change  to  reflect  new 
circumstances .  Capturing  their  goals  in  mathematical  terms  is  an  intellectual 
challenge.  Box  (1982)  stressed  the  need  to  design  experiments  with  all 
important  goals  in  mind,  not  just  one  or  two.  This  point  is  especially 
important  in  light  of  the  influence  of  optimal  design,  which  usually  employs  a 
single  criterion  function.  It  is  good  that  computer  programs  that  have  been 
developed  to  search  for  optimal  designs  (see  Section  4)  also  compute  and 
output  other  characteristics  of  the  best  designs  found,  and  not  just  the 
single  criterion  by  which  they  search. 

Some  useful  research  might  be  devoted  to  exploring  which  goals  make 
similar  demands  of  a  design  and  which  goals  make  contradictory  demands.  As  an 
example,  one  way  to  achieve  precise  estimation  and  to  allow  adequate  checks 
for  lack  of  fit  is  to  increase  the  sample  size;  these  goals  are 
complementary.  Increasing  the  sample  size,  however,  contradicts  the  goal  of 
minimizing  cost.  Another  example  concerns  experimental  designs  to 
discriminate  among  several  nonlinear  models.  As  was  pointed  out  by  Hill, 
Hunter,  and  Wichern  (1968),  such  designs  may  be  quite  inefficient  for 
estimating  the  parameters  of  the  chosen  model.  They  proposed  alternative 
designs  that  also  took  parameter  estimation  into  consideration.  Further  study 
might  suggest  effective  compromises  which  allow  several  goals  to  be  met 
reasonably  well.  Some  of  the  work  described  in  Section  5  has  attempted  to  do 
this,  compromising  between  efficient  estimation  of  an  assumed  model  and  the 
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ability  to  estimate  a  more  complicated  model.  A  related  issue  is  the  problem 
of  designing  an  experiment  that  has  more  than  one  response  (see  Draper  and 
Hunter  1966,  1967).  An  efficient  design  for  one  response  may  not  be  efficient 
for  some  other  response.  Again,  methods  are  needed  which  allow  for  some 
compromise,  so  that  a  design  which  is  reasonably  efficient  for  all  the 
responses  can  be  achieved.  Further  work  in  this  area  would  be  welcome. 

11.3  Planning  Experiments  in  the  Real  World 

Before  any  formal  experimental  plan  can  be  laid  out,  it  is  essential  to 
state  clearly  the  goals  of  the  experiment  and  to  discuss  possible  factors  that 
might  substantially  affect  the  experimental  results  and  the  ability  to 
generalize  from  them.  Although,  realistically,  there  may  be  infinitely  many 
factors  that  might  affect  the  results,  the  experimental  design  will  be  able  to 
study  only  a  small  subset  of  them.  Thus  decisions  must  be  made  as  to  which 
factors  will  be  systematically  varied  and  over  what  ranges,  which  factors  held 
constant,  and  which  factors  that  are  not  subject  to  control  should  be  observed 
for  possible  use  as  covariates  in  the  analysis.  (See  Hahn  1982a  for  a  useful 
discussion. )  The  likely  effect  of  the  factors  on  the  experimental  results 
should  also  be  considered.  Sometimes  current  knowledge  of  the  basic  mechanism 
of  the  system  being  studied  may  suggest  a  useful  nonlinear  model  and  the 
experiment  should  then  be  designed  with  this  model  in  mind.  It  is  usually 
hoped  that  the  remaining  factors,  whose  effects  will  all  enter  into  the 
"error"  term  in  (5.1),  are  unimportant.  Furthermore,  it  is  hoped  that  if  any 
of  these  presumably  unimportant  factors  do  have  large  effects,  randomization 
will  succeed  in  neutralizing  them.  Sometimes  the  impact  of  such  lurking 
variables  becomes  evident  only  when  further  experimentation  is  unable  to 
replicate  the  original  results  and  a  search  for  additional  important  factors 
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is  initiated.  The  possible  presence  of  influential  lurking  variables  helps 
explain  the  importance  of  replicating  scientific  results  by  more  than  one 
experimenter. 

Every  good  experimental  program  should  consider  the  issues  mentioned  in 
the  preceding  paragraph.  They  are  especially  important,  however,  for 
statisticians  who  aid  in  planninq  experiments,  since  statisticians  will  often 
lack  intimate  knowledge  of  the  subject  area  in  which  the  experiment  is  being 
conducted.  Consulting  statisticians  have  found,  as  Fisher  did,  that  asking 
questions  to  clarify  these  issues  and  to  learn  about  the  experiment  is  often 
valuable,  not  just  for  their  own  enlightenment,  but  also  to  force 
experimenters  to  explain  and  justify  their  ideas.  As  Cochran  and  Cox  (1957) 
observed:  "The  statistician  who  expects  that  his  contribution  to  the  planning 

will  involve  some  technical  matter  in  statistical  theory  finds  repeatedly  that 
he  makes  a  much  more  valuable  contribution  simply  by  getting  the  investigator 
to  explain  clearly  why  he  is  doing  the  experiment,  to  justify  the  experimental 
treatments  whose  effects  he  proposes  to  compare,  and  to  defend  his  claim  that 
the  completed  experiment  will  enable  its  objectives  to  be  realized"  (p.  10). 

An  illustrative  example  is  the  story  in  Hunter  (1981a)  of  a  successful 
experimental  planning  session  in  which  the  statistician  did  no  more  than  to 
ask  the  two  principal  investigators  to  explain  the  goals  of  the  experiment. 

The  investigators  were  surprised  to  discover  that  each  had  a  different 
understanding  of  the  goals,  but  after  45  minutes  of  vigorous  debate,  they  had 
established  a  clear  consensus. 

How  can  a  statistician  learn  about  the  goals  of  an  experiment?  What  are 
the  important  questions  to  ask  at  the  initial  planning  phase  of  an 
investigation*  How  can  a  statistician  elicit  information  which  he  may  regard 


as  crucial  to  a  good  design,  but  that  the  experimenter  regards  as  marginal? 


p 


Joiner  and  Pollack  (1982,  p.  334)  listed  a  number  of  issues  that  they  have 
repeatedly  found  to  be  important.  The  importance  of  good  consulting  skills 
and  the  benefits  to  be  derived  from  working  in  this  area  are  underrated  by 
statisticians.  For  information  on  statistical  consulting,  see  Boen  and  Zahn 
(1982),  McCulloch  et.  al.  (1982),  Joiner  (1982),  Zahn  and  Isenberg  (1983),  and 
the  references  listed  therein.  One  important  way  for  statisticians  to  help 
themselves  is  to  learn  more  about  the  subject  matter  field(s)  in  which  they 
consult.  They  should  continually  ask  questions  about  the  theory  underlying  an 
experiment.  A  deeper  understanding  of  the  basic  mechanisms  that  govern  the 
process  being  studied  can  often  suggest  more  efficient  ways  to  design  and 
analyze  experiments.  We  believe  that  the  role  of  the  statistician  as  a 
planner  of  experiments  is  deserving  of  special  consideration.  As  one 
suggestion,  statisticians  who  have  designed  many  experiments  might  consider 
sharing  some  of  the  things  they  do  that  seem  to  be  most  helpful  to  their 
clients,  including  techniques  they  use  to  ensure  that  they  have  a  clear 
understanding  of  the  nature  of  the  experiment. 

One  of  the  strengths  of  statistical  experimental  design  is  the  ability  to 
view  experimentation  in  terms  of  abstract  mathematical  models.  This  abstract 
view  has  allowed  statisticians  to  recognize  common  ground  in  experiments  that 
otherwise  appear  to  be  quite  different  and  has  facilitated  the  invention  of 
many  designs  that  are  useful  across  a  broad  range  of  subject  areas.  In  many 
practical  applications,  however,  idealized,  abstract  experimental  plans  must 
be  tempered  by  the  reality  of  the  particular  experimental  setting  at  hand. 

(See,  for  example,  the  discussion  in  Cox  1958,  Chapter  9).  In  particular, 
designs  that  have  been  derived  using  mathematical  criteria  should  be  used  as  a 
guideline,  not  followed  slavishly.  Consulting  statisticians  have  often  found 
that  a  visit  to  the  laboratory,  plant,  or  field  where  the  experiment  will 
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actually  be  carried  out  la  an  Invaluable  aid  in  proposing  a  design.  An 
experimental  design  must  be  tailored  to  fit  the  experiment  and  not  vice  versa. 

An  experimental  design  that  looks  great  on  paper  is  of  little  use  if  it 
is  not  followed.  Sometimes/  in  the  middle  of  an  experiment,  the  investigator 
discovers  that  some  of  the  planned  runs  cannot  be  made,  or  that  the  experiment 
must  be  terminated  early.  Some  research  on  how  to  proceed  when  the  original 
experimental  plan  cannot  be  carried  to  completion  was  reported  in  Sections  6 
and  8,  but  more  is  needed. 

Statistical  consultants  who  propose  experimental  designs  must  cooperate 
closely  with  the  experimenter,  so  that  the  latter  clearly  understands  the 
design  and  why  it  is  important.  In  this  regard,  it  is  desirable  to  stress 
simplicity  in  developing  new  experimental  designs,  but  this  is  a  property  that 
is  rarely  mentioned.  Research  indicating  what  difficulties  experimenters 
encounter  in  applying  frequently  advocated  designs  might  be  of  great  use  to 
statisticians  who  design  experiments.  One  suggestion  is  that  the  statistician 
actually  participate  in  running  the  experiment,  or  at  least  be  present  during 
the  collection  of  data,  in  order  to  obtain  first-hand  knowledge  of  all  the 
unforeseen  problems  that  are  encountered.  This  practice  can  be  especially 
helpful  when  there  are  some  statistically  important  factors  that  the 
experimenter  regarded  as  inconsequential,  and  never  mentioned  to  the 
statistician. 

Some  readers  may  think  that  the  questions  raised  in  this  section  are  too 
trivial  to  be  the  subject  of  statistical  interest.  We  don't  think  so.  To  the 
contrary,  we  think  that  these  are  the  most  important  questions  to  address  and 
we  encourage  more  statisticians  and  scientists  to  share  their  experiences  with 
problems  • t ey  have  encountered  in  planning  experimental  programs.  The  article 
by  Hahn  (1984)  is  an  excellent  example  of  what  we  have  in  mind.  His 
description  of  six  experiments  in  which  he  participated  as  a  statistical 


consultant  illustrates  how  effectively  a  well-designed  experiment  can  work  and 
also  how  ingenuity  must  often  be  used  to  make  the  design  fit  the  needs  of  the 
experimenter.  Joiner  (1977)  described  the  design  and  analysis  of  an 
experiment  with  a  number  of  unusual,  non-standard  problems  that  had  to  be 
solved.  See  also  Hooke  (1980),  Hunter  (1981a, b),  and  Bishop,  Peterson,  and 
Trayser  (1982).  We  think  more  articles  of  this  nature  would  benefit  all  of 
us. 

Discussion  of  the  practical  problems  encountered  in  planning  real-world 
experiments,  sample  surveys,  and  censuses  should  be  included  in  the  training 
of  every  scientist  and  of  every  statistics  student.  Too  often  these  problems 
are  swept  aside  in  an  instructor's  desire  to  teach  material  on  the  theory, 
rather  than  the  practice,  of  statistics. 

There  are  indications  that  increasing  use  is  being  made  of  statistically 
designed  experiments.  In  Europe,  for  example,  especially  in  chemistry,  the 
use  of  designs  is  becoming  widespread,  following  the  leadership  of 
Phan-Tan-Luu,  Carlson,  and  others  (see,  for  example,  Carlson,  Lundstet, 
Phan-Tan-Luu,  and  Mathieu  1983,  Carlson,  Nilsson,  and  Stromqvist  1983,  Lazaro, 
Bouchet,  and  Jacquier  1977,  and  Brunei,  Itier,  Commeyras,  Phan-Tan-Luu,  and 
Mathieu  1979). 

11.4  Education 

The  growth  of  knowledge  in  experimental  design  over  the  last  25  years  has 
been  tremendous.  Many  scientists  are  now  aware  that  statistically  designed 
experiments  can  greatly  increase  the  efficiency  of  their  research  work,  which 
is  to  the  credit  of  the  many  individuals  who  have  worked  in  the  field,  as  well 
as  to  journals  such  as  Technometrics  that  have  adopted  as  a  clear  priority  the 
dissemination  of  statistical  advances  in  the  chemical,  physical,  and 


engineering  sciences 


Par  more  work  in  education  is  absolutely  necessary.  Many  experimenters 
still  have  little  or  no  idea  of  even  such  basic  statistical  concepts  as 
blocking,  replication,  randomization,  and  factorial  design.  Every  applied 
statistician  has  a  collection  of  sad  tales  in  which  clients  asked  them  to 
salvage  a  poorly-planned  experiment  with  a  clever  analysis.  But  the  damage 
done  by  poor  experimental  design  is  irreparable.  No  amount  of  analysis  can 
create  information  where  none  exists  in  the  first  place.  By  contrast, 
well-planned  experiments  often  require  only  simple  analyses  in  order  to  reach 
clear,  unambiguous  conclusions.  Yet  our  impression  is  that  many 
poorly-planned  experiments  are  performed.  Indeed,  Mead  and  Pike  (1975),  in 
their  review  of  the  use  of  response  surface  methodology  in  the  biological 
sciences,  concluded  that  poorly-planned  experiments  were  more  the  rule  than 
the  exception. 

It  is  important  to  remember  how  much  we  can  accomplish  as  teachers. 
Statistics  should  be  an  essential  tool  for  science  and  engineering  students, 
but  it  is  often  regarded  as  a  subject  that  is  too  marginal  to  include  in  the 
curriculum.  Perhaps  one  effective  way  to  convince  colleagues  in  other  fields 
of  the  value  of  statistical  training  would  be  to  increase  the  emphasis  on  the 
design  of  experiments.  Many  important  ideas  in  the  design  of  experiments  can 
(and  should)  be  taught  in  introductory  statistics  courses  for  university 
students. 

Elements  of  experimental  design  can  also  be  taught  to  high  school,  junior 
high  school,  and  grade  school  students.  The  difference  between  correlation 
and  causation,  examples  of  nonsense  correlations,  how  to  set  up  valid 
comparative  experiments,  the  weakness  of  varying  one  variable  at  a  time,  and 
the  efficacy  of  two-level  factorial  experiments  are  useful  topics  to  cover. 
Students  need  to  be  taught  ideas  and  procedures  that  will  help  them  gather 
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information  to  better  understand  the  world  around  them.  For  example,  Dalia 


Sredni,  a  seventh  grader  in  California,  won  first  place  in  a  county  science 
fair  by  conducting  a  23  factorial  design  to  study  the  effects  of  varying  the 
oven  temperature,  baiting  time,  and  amount  of  baking  soda  on  the  height, 
consistency,  texture,  and  taste  of  a  cake.  Students  will  welcome  the 
opportunity  to  plan  and  conduct  experiments  of  their  own  choosing.  Active 
learning  through  experiments  can  be  a  refreshing  change  from  the  more  usual 
passive  learning  via  reading  and  listening,  allowing  students  to  enjoy  the 
element  of  surprise  and  the  thrill  of  discovery. 

11.5  Interactive  Computer  Programs 

The  development  of  interactive  computer  programs  to  aid  in  the  design  of 
experiments  will  certainly  be  a  focal  point  in  the  years  ahead.  Researchers 
currently  have  at  their  disposal  computer  packages  which  allow  them  to  perform 
almost  any  standard  method  of  data  analysis.  No  comparable  software  in  the 
field  of  experimental  design  has  achieved  such  widespread  use.  Easy-to-use, 
interactive,  computer  packages  could  greatly  aid  researchers  in  choosing  a 
good  experimental  design.  More  important,  just  as  the  availability  of 
computer  software  has  led  to  a  revolution  in  the  kinds  of  data  analyses  which 
researchers  now  regard  as  essential  professional  tools,  so  will  software  for 
experimental  design  lead  to  a  revolution  in  researchers'  awareness  of 
statistically  designed  experiments. 

Some  comments  are  in  order  to  differentiate  between  the  research  that  we 
discussed  in  Section  4  and  the  ideas  of  the  preceding  paragraph.  The  research 
which  has  been  done  thus  far  has  been  devoted  largely  to  developing  numerical 
algorithms  which  can  generate  designs  with  certain  desirable  properties  (such 
as  D-optimal  designs).  These  algorithms  have  succeeded  in  finding  many  useful 


designs;  however,  they  are  intended  more  for  use  in  statistical  research  than 
for  use  by  experimenters.  What  we  have  in  mind  here  is  the  development  of 
interactive  "expert"  software  packages  that  experimenters  themselves  could  use 
to  help  them  design  their  experiments,  in  much  the  same  way  they  would  obtain 
advice  from  a  statistical  consultant. 

The  development  of  good  computer  software  for  experimental  design  is  not 
a  panacea,  any  more  than  the  existence  of  statistical  analysis  packages  has 
been.  The  proliferation  of  sophisticated  statistical  analyses  has  included 
many  instances  where  the  use  of  a  statistical  technique  was  ill-advised  and 
led  to  unjustified  conclusions.  It  is  important  that  the  users  of 
experimental  design  packages  have  at  least  some  knowledge  of  the  basic 
statistical  principles  of  experimental  design.  It  is  also  important  that  the 
software  be  intelligent  enough  to  ask  the  experimenter  many  of  the  same 
questions  that  a  good  statistical  consultant  would  ask  and  to  recommend  that  a 
statistician  be  consulted  in  special  circumstances.  Thus  the  comments  of  the 
preceding  sections  regarding  the  role  of  the  statistician  in  planning 
real-world  experiments  and  as  an  educator  should  also  be  seen  as  essential 
companions  to  the  development  of  computer  programs  for  experimental  design. 

We  believe  that  the  benefits  of  experimental  design  software  far  outweigh 
the  potential  hazards.  Many  experiments  could  be  improved  substantially  by 
the  use  of  simple,  well-established  statistical  designs.  The  existence  of 
good  software  for  experimental  design  would  be  a  great  step  toward  achieving 


that  goal. 


12.  SOME  RECOMMENDATIONS 


We  conclude  with  three  sets  of  recommendations  addressed  to  experimenters 
and  statisticians.  The  theme  that  runs  through  these  recommendations  is  that 
important  advances  in  the  theory  and  practice  of  experimental  design  can  be 
achieved  if  experimenters  and  statisticians  converse  with  and  learn  from  one 
another.  If  communications  were  improved,  each  group  could  help  shape  future 
research  in  the  other's  area  in  significant  ways.  If  such  dialogue  is  to  bear 
fruit,  concerted  effort  will  be  needed  to  facilitate  visits  to  each  other's 
"camps"  to  learn  the  language,  customs,  problems,  and  goals  of  the  other 
group. 

1.  Teaching  and  Learning  about  Experimental  Design.  Experimenters:  if 
more  of  you  were  aware  of  the  concepts  and  techniques  of  statistical 
experimental  design  and  used  them  in  your  work,  research  efficiency  —  the 
amount  of  information  gained  per  unit  of  resources  used  (money,  time,  etc.)  — 
in  industry,  government,  and  academia  could  be  improved  substantially.  You 
need  to  learn  how  statistical  methods  can  be  combined  with  the  science  and 
technology  you  know  so  that  you  can  use  that  knowledge  more  effectively  in 
planning  experiments  to  acquire  new  information;  the  feeling  among  some 
scientists  that  statistical  methodology  is  a  substitute  for  that  knowledge  is 
an  unfortunate  misconception.  Statisticians  do  not  by  any  means  have  all  the 
answers,  but  they  have  thought  deeply  about  experimental  strategy.  Many  of 
you  could  help  yourselves  considerably  by  studying  what  statisticians  have 
written  on  the  subject  of  experimental  design.  Those  of  you  who  realize  the 
value  of  statistically  designed  experiments  could  help  your  colleagues  by 
explaining  to  them  the  benefits  of  such  an  approach. 

Statisticians:  whether  you  are  teaching  statistics  in  service  courses 
for  students  from  other  departments,  in  courses  for  your  own  students,  or 
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special  courses  (such  as  continuing  education  courses  for  persons  in 
industry),  teach  proportionately  more  design  and  less  analysis  than  you  do 
now.  Units  on  statistics  for  high  school  and  grade  school  students  should 
also  emphasize  (and  perhaps  begin  with)  experimental  design.  Of  the  two  large 
areas  of  statistics,  data  collection  and  data  analysis,  the  first  is  more 
important.  A  bad  design  yields  data  that  contain  little  information,  and  no 
amount  of  clever  analysis  can  extract  much  information  where  little  exists. 

Put  more  positively,  the  return  on  investment  in  good  statistical  designs  can 
be  quite  handsome  .ndeed.  Talk  to  some  experimenters  who  have  tried  it  both 
ways.  They'll  give  you  stories  you  can  tell  your  classes. 

2.  Using  Experimental  Design  in  Practice.  One  need  only  glance  through 
journals  such  as  Science  to  realize  how  infrequently  statistical  principles  of 
experimental  design  are  used  in  the  scientific  study  of  complex  systems. 

Porter  and  Busch  (1978)  is  an  exception  that  proves  this  "rule."  Research 
workers  often  consult  statisticians,  if  at  all,  only  after  they  have  assembled 
their  data  and  encountered  difficulties  in  analyzing  them.  In  many  of  these 
situations,  the  application  of  basic  statistical  principles  of  experimental 
design  would  have  generated  data  that  were  much  more  informative,  not  to 
mention  much  easier  to  analyze.  Scientists  and  engineers  could  reap  great 
benefits  by  learning  and  using  these  principles. 

How  useful  is  statistical  experimental  design  in  planning  experiments? 

For  those  of  you  who  hold  managerial  positions  and  would  like  to  gauge  the 
possible  benefits,  we  would  like  to  suggest  an  experiment.  In  the  next  year, 
divide  a  suitable  group  of  experimenters  in  your  organization  into  two 
subgroups,  using  randomization  and  perhaps  blocking.  Provide  one  of  the 
subgroups  with  training  in  statistical  experimental  design.  The  training 
should  emphasize  the  practical  rather  than  the  theoretical  aspects  of  the 


-61- 


subject.  At  the  outset,  decide  on  criteria  that  will  be  used  to  assess  the 
research  efficiency  of  these  individuals  and  how  the  judging  will  be  done. 

For  example,  after  the  passage  of  a  suitable  period  of  time,  the  reports 
written  by  these  experimenters  could  be  judged  by  a  panel  of  experts. 
Alternatively,  hand  out  identical  assignments  to  two  experimenters  or  teams  of 
experimenters,  one  of  which  uses  statistically  designed  experiments  and  one  of 
which  does  not.  (If  you  carry  out  such  an  experiment,  we  would  like  to  know 
what  happened . ) 

Stories  commonly  swapped  among  statistical  consultants  often  end  this 
way:  "If  they  had  only  come  to  talk  to  me  before  they  got  themselves  into 
that  mess,  I  could  have  been  so  much  more  helpful.”  (Incidentally,  lawyers, 
doctors,  and  counselors  of  all  kinds  share  this  same  frustration.)  Yet  the 
payoff  from  good  design  is  frequently  so  great  that  it  is  worth  the  continued 
effort  needed  to  convince  people  to  talk  to  you  at  an  early  stage  in  their 
work.  The  general  problem  is  that  they  need  to  be  educated  about 
statistics.  The  specific  problem  of  persuasion  is  sometimes  solved  by 
communicating  to  potential  clients  "success  stories"  that  feature  situations 
or  experimenters  they  know  first-hand.  Save  such  stories  and  use  them. 

Consultants:  with  persistence,  with  creativity,  with  good  humor,  with 
patience,  try  to  get  clients  to  come  to  you  before  they  collect  their  data,  so 
you  can  give  them  advice  on  experimental  design  and  so  they  can  reap  the 
rewards.  New  consultants:  lay  some  long-term  plans  to  cope  with  this  problem 

and  don't  get  discouraged.  Discouraged  consultants:  revive  your  good 

intentions;  talk  to  consultants  who  have  been  able  to  get  clients  to  come  to 

them  early  for  advice  on  design  and  learn  from  them.  Consultants  who've  been 

successful  in  this  way:  publish  some  of  your  tips. 
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3.  Establishing  and  Extending  New  Frontiers  In  Experimental  Design. 
Experimenters:  when  you  learn  about  the  work  that  statisticians  have  done  on 
experimental  design,  many  of  you  will  conclude  that  it  is  not  useful  for  the 
work  that  you  do.  In  fact,  some  of  you  will  see  an  enormous  gap  separating 
published  statistical  work  and  your  own  needs.  You  can  help  statisticians  to 
do  better  research  if  you  would  communicate  your  perceptions  to  them. 
Statisticians  need  feedback,  information,  and  advice  from  experimenters.  What 
forums  can  be  developed  to  expedite  such  communication?  At  professional 
meetings  —  both  those  of  experimenters  and  those  of  statisticians  —  special 
sessions  should  be  organized  for  discussion  of  such  topics.  Space  in 
statistics  journals  should  be  made  available  for  communications  from 
experimenters  concerning  research  work  that  they  would  like  to  see 
statisticians  undertake.  A  model  of  this  type  of  publication  is  provided,  for 
example,  by  Rosenblatt  and  Spiegelman  (1981).  In  a  reciprocal  manner, 
scientific  journals  should  provide  space  for  statisticians  to  make  "guest 
appearances"  as  Youden  did  in  a  popular  series  in  Industrial  and  Engineering 
Chemistry  and  Hahn  has  been  doing  in  a  similar  series  in  Chemtech . 

Scientific  investigations  have  served  and  must  continue  to  serve  as  a 
touchstone  for  statistical  research  in  experimental  design.  The  crucial 
insight  that  experimental  design  should  be  a  branch  of  statistics  became 
evident  to  Fisher  because  of  his  close  interaction  with  experimental 
scientists.  As  Box  (1983)  observed  of  Fisher's  work  at  Rothamsted:  "One  can 
clearly  see  the  ideas  of  randomisation,  replication,  orthogonal  arrangement, 
blocking,  factorial  designs,  measurement  of  interactions,  confounding,  all 
developing  in  response  to  the  practical  necessities  of  field  experimentation" 
(p.  5).  New  ideas  in  the  1950's  on  response  surface  methods  and,  in  more 
recent  times,  on  mixture  designs  were  similarly  stimulated  by  the  needs  of 
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experimenters-  Much  of  the  novel  and  useful  work  in  experimental  design  has 
been  done  by  statisticians  working  with  or  consulting  for  experimenters  in 


agriculture  and  industry. 

Statistical  research  exploits  more  or  less  general  mathematical 
abstractions  of  particular  experimental  settings.  Consequently,  it  is  always 
helpful  for  statisticians  to  study  the  experimental  context  that  gives  meaning 
and  relevance  to  the  mathematics  because  it  may  reveal  explicit  or  implicit 
limitations  to  the  theory  that  has  been  developed  up  to  that  point.  We  urge 
research  statisticians  interested  in  breaking  new  ground  in  experimental 
design  to  consult  with  and,  preferably,  work  collaboratively  with 
experimenters  who  are  working  on  worthwhile  projects  —  as  Box  did  at  Imperial 
Chemical  Industries  and  Fisher  did  at  Rothamsted  Experimental  Station. 
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